Lightweight scraper + ETL for Roblox accessories, storing results in SQLite and exposing a small FastAPI read-only API.
Contents
UTILS.py- core scraping and DB helpers (init, scrape_new_items, insert_rows, get_most_recent_link, send_email)ETL.py- one-shot ETL runner that scrapes new items and inserts into the DBfastAPI.py- small read-only API to query the scraped datacreds.json- (not checked in) credentials and local chrome/driver pathsexample.db- SQLite databases example (created at runtime)
Overview
This repo scrapes the Roblox accessories catalog periodically (ETL job) and persists each item to a local SQLite database. The API serves the stored rows for simple queries, pagination and basic stats.
Database schema (sqlite):
- Table:
roblox_accessoriesidINTEGER PRIMARY KEY AUTOINCREMENTNameTEXTcategoryTEXTpriceTEXT -- kept as TEXT to support values likeFree/unavailableCreatorTEXTIsVerifiedINTEGER (0/1)IsLimitedINTEGER (0/1)LinkTEXT (unique index)ImageURLTEXTtimeCollectedTEXT (ISO datetime string)
Prerequisites
- Python 3.10+ (project tested on CPython 3.10+)
- Chrome and compatible chromedriver installed locally
creds.jsonwith at least the following keys (example):
{
"chrome_executable_path": "Path to your chrome.exe",
"driver_executable_path": "Path to your chromedriver.exe",
"email": "you@example.com",
"password": "app-password-or-smtp-pass",
"send_to_email": "notify@example.com"
}Install dependencies
python -m pip install -r requirements.txtRun the ETL (one-shot)
python ETL.pyETL behavior
ETL.pycallsget_most_recent_link()to find the last storedLinkand scrapes newer items until it reaches that link (stop is exclusive). New rows are then inserted withINSERT OR IGNOREto avoid duplicates.- The scraper stores
priceas TEXT so it supportsFree,unavailable, or numeric strings.
Run the API (development)
uvicorn fastAPI:app --reload --port 8000API highlights
GET /items— list items withlimit,offset, and filters:creator,category,verified,limitedGET /items/{item_id}— retrieve single item byidGET /stats— database statisticsGET /recent— most recent items bytimeCollected
Notes and troubleshooting
- Make sure the paths in
creds.jsonare correct and Chrome/Chromedriver versions are compatible. - If scraping fails frequently, enable a visible browser (set
headless=False) to debug page loads and XPaths. - The repository uses
undetected_chromedriverto reduce bot detection, but behavior may still change if Roblox updates their markup.
Security
- Do not commit
creds.jsonor any secrets to source control.