-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Overview
Pipeline for extracting entities from daily content and sourcing visual assets (icons/logos). Seeking collaboration to improve coverage and methodology.
Related PR: #26
Current Pipeline
Daily Facts → Entity Extraction (LLM) → Inventory → Asset Matching → Coverage Report
↓
CoinGecko (tokens)
Manual curation (others)
Scripts
| Script | Purpose |
|---|---|
scripts/etl/extract-entities.py |
Extract entities via LLM |
scripts/posters/fetch-icons.py |
Fetch token icons from CoinGecko |
scripts/posters/generate-asset-checklist.py |
Generate coverage report |
Current Coverage
| Category | Coverage |
|---|---|
| Tokens | 20% (19/96) |
| Platforms | 17% (33/189) |
| Tech | 11% (18/157) |
| Projects | 14% (34/244) |
| Plugins | 30% (53/175) |
Strengths
- Automated extraction - LLM identifies entities from unstructured content
- Normalization -
--normalize-onlydedupes without re-extraction (saves API calls) - CoinGecko integration - Reliable token icons with rate limiting
- Fuzzy matching - Containment matching reduces false negatives
- Pre-scan efficiency - Checks existing files before making API calls
Weaknesses / Open Questions
- Low platform coverage - No reliable automated source for platform icons
- Manual curation - Plugins/projects need manual sourcing
- Entity noise - Extraction sometimes includes generic terms
- No OSINT automation - Finding official sources is still manual research
- No validation - Can't verify icon authenticity/currency
Ideas for Improvement
- Better extraction prompts to reduce noise
- GitHub API for project avatars/social images
- Web scraping for official brand pages (og:image, favicons)
- Community-sourced icon contributions
- Image similarity detection to avoid duplicates
How to Contribute
- Improve coverage - Add CoinGecko ID mappings for missing tokens in
fetch-icons.py - Source research - Find reliable APIs/methods for platform/tech icons
- Pipeline feedback - Suggest improvements to extraction/matching logic
- Icon contributions - Submit PRs with properly sourced icons
Files
scripts/posters/assets/entity-inventory.json- Current entity list (1143 entities)scripts/posters/assets/asset-checklist.md- Coverage reportscripts/posters/assets/icons/- Downloaded icons
coderabbitai
Metadata
Metadata
Assignees
Labels
No labels