Skip to content

Commit c8f282b

Browse files
committed
chore(other): deno fmt
1 parent c831f34 commit c8f282b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+2454
-1455
lines changed

.vscode/settings.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
{
22
"claudeCodeChat.permissions.yoloMode": true
3-
}
3+
}

apps/classify/workflow/BATCHING.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,13 @@ INTERNAL_EMIT_DELAY_MS=100
2323
### Script Defaults
2424

2525
All scripts use these defaults:
26+
2627
- **Batch size:** 5 indicators per batch
2728
- **Concurrent batches:** 4 batches running in parallel
2829
- **Total concurrency:** 20 indicators processing simultaneously
2930

3031
This is configured in:
32+
3133
- [scripts/run-random.ts](./scripts/run-random.ts) (lines 418-419)
3234
- [scripts/run-all.ts](./scripts/run-all.ts) (lines 417-418)
3335

@@ -61,6 +63,7 @@ This is configured in:
6163
## Usage Examples
6264

6365
### Run Random Indicators
66+
6467
```bash
6568
# Process 20 random indicators (4 batches × 5)
6669
deno task run:random -- -20 openai
@@ -70,6 +73,7 @@ deno task run:random -- -100 openai
7073
```
7174

7275
### Run All Indicators
76+
7377
```bash
7478
# Process all indicators (in groups of 20)
7579
deno task run:all openai
@@ -81,12 +85,13 @@ deno task run:all 40 openai
8185
## Performance Tuning
8286

8387
### Increase Concurrency
88+
8489
To process **40 indicators concurrently** (8 batches of 5):
8590

8691
1. Update scripts:
8792
```typescript
8893
const batchSize = 5;
89-
const concurrentBatches = 8; // Changed from 4
94+
const concurrentBatches = 8; // Changed from 4
9095
```
9196

9297
2. Ensure your system can handle it:
@@ -95,15 +100,17 @@ To process **40 indicators concurrently** (8 batches of 5):
95100
- Database connection pool may need adjustment
96101

97102
### Reduce Concurrency
103+
98104
To process **10 indicators concurrently** (2 batches of 5):
99105

100106
1. Update scripts:
101107
```typescript
102108
const batchSize = 5;
103-
const concurrentBatches = 2; // Changed from 4
109+
const concurrentBatches = 2; // Changed from 4
104110
```
105111

106112
### Change Batch Size
113+
107114
To use larger batches (e.g., 10 indicators per batch):
108115

109116
1. Update `.env`:
@@ -113,8 +120,8 @@ To use larger batches (e.g., 10 indicators per batch):
113120

114121
2. Update scripts:
115122
```typescript
116-
const batchSize = 10; // Changed from 5
117-
const concurrentBatches = 2; // Adjust to maintain total concurrency
123+
const batchSize = 10; // Changed from 5
124+
const concurrentBatches = 2; // Adjust to maintain total concurrency
118125
```
119126

120127
## Monitoring
@@ -190,18 +197,22 @@ CREATE TABLE pipeline_stats (
190197
## Troubleshooting
191198

192199
### "Too many concurrent requests"
200+
193201
- Reduce `concurrentBatches` in scripts
194202
- Increase `INTERNAL_EMIT_DELAY_MS` in `.env`
195203

196204
### "Out of memory"
205+
197206
- Reduce `concurrentBatches` (fewer indicators processing simultaneously)
198207
- Use smaller LLM model in LM Studio
199208

200209
### "Database locked"
210+
201211
- SQLite handles concurrency well with WAL mode (enabled by default)
202212
- If issues persist, consider switching to PostgreSQL for production
203213

204214
### Batches not completing
215+
205216
- Check logs for errors in individual steps
206217
- Query `processing_log` table for failed stages
207218
- Increase timeout in `waitForBatchCompletion()` if needed

apps/classify/workflow/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -378,6 +378,7 @@ Results are stored in the following Motia state groups:
378378
Deploy the workflow service to Railway for production-scale processing with horizontal scaling and PostgreSQL/TimescaleDB persistence.
379379

380380
**Performance:**
381+
381382
- **Local (M3)**: ~30-40 indicators/min
382383
- **Railway (3 replicas)**: ~150 indicators/min (5-6× faster)
383384
- **10,903 indicators**: ~73 minutes on Railway vs ~6 hours locally
@@ -439,12 +440,14 @@ NODE_ENV=production
439440
### API Usage
440441

441442
**Health Check:**
443+
442444
```bash
443445
curl https://your-service.up.railway.app/health
444446
# Response: {"status":"ok","timestamp":"...","service":"classify-workflow"}
445447
```
446448

447449
**Classify Batch:**
450+
448451
```bash
449452
curl -X POST https://your-service.up.railway.app/classify/batch \
450453
-H "Content-Type: application/json" \
@@ -457,23 +460,27 @@ curl -X POST https://your-service.up.railway.app/classify/batch \
457460
### Rate Limits & Scaling
458461

459462
**OpenAI GPT-4o-mini Tier 1:**
463+
460464
- TPM: 200,000 tokens/minute
461465
- Per indicator: ~1,000 tokens (2-3 LLM calls)
462466
- Max throughput: ~200 indicators/minute
463467

464468
**Recommended Configuration:**
469+
465470
- 3 replicas × 50 concurrent each = 150 concurrent total
466471
- TPM usage: 75% (150K/200K)
467472
- 25% headroom for variance/retries
468473

469474
**Scaling:**
475+
470476
- 2 replicas: ~100 indicators/min (safe, 50% TPM)
471477
- 3 replicas: ~150 indicators/min (recommended, 75% TPM)
472478
- 4 replicas: ~200 indicators/min (max Tier 1, 100% TPM)
473479

474480
### Monitoring
475481

476482
**Track Performance:**
483+
477484
```sql
478485
-- View batch statistics
479486
SELECT * FROM pipeline_stats ORDER BY batch_start_time DESC LIMIT 10;
@@ -489,23 +496,27 @@ FROM classifications;
489496
```
490497

491498
**Railway Metrics:**
499+
492500
- Service health via `/health` endpoint
493501
- CPU/Memory usage in Railway dashboard
494502
- Request rate and latency
495503

496504
### Cost Analysis
497505

498506
**API Costs (OpenAI GPT-4o-mini):**
507+
499508
- Per indicator: ~$0.00382
500509
- 10,903 indicators: ~$42
501510
- Same cost regardless of replicas!
502511

503512
**Infrastructure (Railway):**
513+
504514
- Workflow service: ~$10-20/month (3 replicas)
505515
- Postgres: ~$10-20/month
506516
- Total: ~$20-40/month
507517

508518
**Time Savings:**
519+
509520
- Local: ~6 hours per 10.9K run
510521
- Railway: ~1.2 hours per 10.9K run
511522
- Saves: ~4.8 hours per run

apps/classify/workflow/SQLITE_MIGRATION_SUMMARY.md

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
- **PostgreSQL:** Uses `$1`, `$2`, etc. placeholders
1313

1414
**Files Updated:**
15+
1516
- [src/db/repository.ts](./src/db/repository.ts) - All query methods now detect database type and use appropriate placeholders
1617
- [src/db/schema.ts](./src/db/schema.ts) - Added `metadata` column to `processing_log` table
1718
- [steps/classify-flow/complete-classify.step.ts](./steps/classify-flow/complete-classify.step.ts) - Fixed old `db.prepare()` calls, converted booleans to integers, JSON-stringified arrays/objects
@@ -21,6 +22,7 @@
2122
**Problem:** SQLite only accepts primitives (numbers, strings, bigints, buffers, null) but we were trying to bind JavaScript booleans and objects.
2223

2324
**Solution:**
25+
2426
- **Booleans → Integers:** All boolean fields (`is_cumulative`, `is_currency_denominated`, `boolean_review_passed`) now convert to 0/1
2527
- **Objects/Arrays → JSON Strings:** Fields like `boolean_review_fields_wrong` and `final_review_corrections` are JSON-stringified before saving
2628

@@ -29,6 +31,7 @@
2931
**Problem:** `processing_log` table was missing the `metadata` column that was added to PostgreSQL.
3032

3133
**Solution:**
34+
3235
```bash
3336
sqlite3 ./data/classify-workflow-local-dev.db "ALTER TABLE processing_log ADD COLUMN metadata TEXT;"
3437
```
@@ -40,6 +43,7 @@ sqlite3 ./data/classify-workflow-local-dev.db "ALTER TABLE processing_log ADD CO
4043
**Solution:** Reduced to **1 concurrent batch** (5 indicators at a time):
4144

4245
**Files Updated:**
46+
4347
- [scripts/run-random.ts](./scripts/run-random.ts) - `concurrentBatches = 1`
4448
- [scripts/run-all.ts](./scripts/run-all.ts) - `concurrentBatches = 1`
4549

@@ -48,6 +52,7 @@ sqlite3 ./data/classify-workflow-local-dev.db "ALTER TABLE processing_log ADD CO
4852
**Problem:** When indicators got stuck, the script would wait forever with no feedback.
4953

5054
**Solution:** Added smart progress detection:
55+
5156
- Detects when no progress for 20 seconds
5257
- Shows which indicators are stuck and at which stage
5358
- Shows error messages for failed indicators
@@ -56,6 +61,7 @@ sqlite3 ./data/classify-workflow-local-dev.db "ALTER TABLE processing_log ADD CO
5661
## Final Configuration
5762

5863
### Environment Variables (`.env`)
64+
5965
```bash
6066
# SQLite Database
6167
CLASSIFY_DB=sqlite
@@ -67,20 +73,23 @@ INTERNAL_EMIT_DELAY_MS=500 # 500ms delay between batches
6773
```
6874

6975
### Script Defaults
76+
7077
```typescript
71-
const batchSize = 5; // 5 indicators per batch
72-
const concurrentBatches = 1; // 1 batch at a time
78+
const batchSize = 5; // 5 indicators per batch
79+
const concurrentBatches = 1; // 1 batch at a time
7380
// Total concurrency: 5 indicators × ~6 LLM stages = ~30 API calls max
7481
```
7582

7683
## Performance Characteristics
7784

7885
### Before
86+
7987
- **Configuration:** 4 batches × 5 indicators = 20 concurrent
8088
- **LLM Calls:** ~120 concurrent API calls
8189
- **Result:** Rate limiting, stuck indicators, incomplete batches
8290

8391
### After
92+
8493
- **Configuration:** 1 batch × 5 indicators = 5 concurrent
8594
- **LLM Calls:** ~30 concurrent API calls max
8695
- **Result:** Stable processing, all indicators complete, no rate limits
@@ -114,13 +123,15 @@ Next batch of 5 starts
114123
## Database Compatibility
115124

116125
### SQLite (Local Development)
126+
117127
- ✅ Proper `?` placeholders
118128
- ✅ Boolean values as 0/1 integers
119129
- ✅ JSON fields as TEXT
120130
- ✅ WAL mode enabled for concurrency
121131
- ✅ All schema columns present
122132

123133
### PostgreSQL (Production)
134+
124135
- ✅ Proper `$1, $2` placeholders
125136
- ✅ Boolean values as BOOLEAN type
126137
- ✅ JSON fields as JSONB
@@ -130,11 +141,13 @@ Next batch of 5 starts
130141
## Usage
131142

132143
### Run 50 Random Indicators
144+
133145
```bash
134146
deno task run:random -- -50 openai
135147
```
136148

137149
### Expected Output
150+
138151
```
139152
🚀 Processing 50 indicators in 10 batches of 5...
140153
Provider: openai
@@ -152,19 +165,22 @@ deno task run:random -- -50 openai
152165
```
153166

154167
### Average Timing
168+
155169
- **Per indicator:** ~25-35 seconds (with OpenAI GPT-4.1-mini)
156170
- **Per batch (5 indicators):** ~30-45 seconds
157171
- **50 indicators total:** ~5-8 minutes
158172

159173
## Monitoring
160174

161175
### Check Completion Status
176+
162177
```bash
163178
sqlite3 ./data/classify-workflow-local-dev.db \
164179
"SELECT COUNT(*) FROM classifications;"
165180
```
166181

167182
### Check Recent Activity
183+
168184
```bash
169185
sqlite3 ./data/classify-workflow-local-dev.db \
170186
"SELECT stage, status, COUNT(*) as count
@@ -174,6 +190,7 @@ sqlite3 ./data/classify-workflow-local-dev.db \
174190
```
175191

176192
### Check Failed Indicators
193+
177194
```bash
178195
sqlite3 ./data/classify-workflow-local-dev.db \
179196
"SELECT indicator_id, stage, error_message
@@ -210,17 +227,20 @@ These are non-critical and logged as warnings. The pipeline will continue proces
210227
## Next Steps
211228

212229
### To Increase Throughput (if no rate limits)
230+
213231
1. Increase `concurrentBatches` to 2
214232
2. Monitor for stuck indicators
215233
3. Adjust based on API performance
216234

217235
### To Switch to PostgreSQL
236+
218237
1. Set `POSTGRES_URL` environment variable
219238
2. Remove or comment out `CLASSIFY_DB=sqlite`
220239
3. Run migrations: `deno task migrate`
221240
4. Restart dev server
222241

223242
### To Use Local LLM (Free)
243+
224244
1. Install LM Studio
225245
2. Load a model (e.g., Mistral 7B)
226246
3. Set environment: `LLM_PROVIDER=local`
@@ -229,22 +249,27 @@ These are non-critical and logged as warnings. The pipeline will continue proces
229249
## Files Changed
230250

231251
### Database Layer
252+
232253
-[src/db/repository.ts](./src/db/repository.ts)
233254
-[src/db/schema.ts](./src/db/schema.ts)
234255
-[src/db/client.ts](./src/db/client.ts)
235256

236257
### Workflow Steps
258+
237259
-[steps/classify-flow/complete-classify.step.ts](./steps/classify-flow/complete-classify.step.ts)
238260

239261
### Scripts
262+
240263
-[scripts/run-random.ts](./scripts/run-random.ts)
241264
-[scripts/run-all.ts](./scripts/run-all.ts)
242265

243266
### Configuration
267+
244268
-[.env](./.env)
245269
-[.env.example](./.env.example)
246270

247271
### Documentation
272+
248273
-[BATCHING.md](./BATCHING.md)
249274
-[examples/README.md](./examples/README.md)
250275
-[examples/parallel-batches.ts](./examples/parallel-batches.ts)

0 commit comments

Comments
 (0)