Search Reddit posts and comments by keyword, subreddit, author, and date range using the Arctic Shift API. Results are saved as CSV files.
pip install -r requirements.txt- Edit
config.yamlwith your search parameters - Run:
python3 scraper.pyOr use a custom config file:
python3 scraper.py my_config.yamlResults are saved to the output_dir folder (default ./results/). Files are named based on your search parameters, e.g.:
results/tesla__r_wallstreetbets__from_2024-01-01__to_2024-12-31_posts.csv
results/tesla__r_wallstreetbets__from_2024-01-01__to_2024-12-31_comments.csv
| Option | Description |
|---|---|
keyword |
Word(s) to search for in post titles/body or comment text |
subreddit |
Subreddit to search in (without r/ prefix) |
author |
Filter by username (without u/ prefix) |
after / before |
Date range, format YYYY-MM-DD or offsets like 1year, 30d |
search_posts |
Search for matching posts (true/false) |
search_comments |
Search for matching comments (true/false) |
fetch_post_comments |
Also fetch all comments under each matched post (true/false) |
output_dir |
Folder to save CSV results |
delay_between_requests |
Seconds between API calls (default 0.5) |
| Syntax | Meaning |
|---|---|
word1 word2 |
Must contain both words (any order) |
"word1 word2" |
Must appear in that exact sequence |
word1 OR word2 |
Either word |
word1 -word2 |
word1 but NOT word2 |
- Keyword search requires a filter: The Arctic Shift API requires at least a
subredditorauthorwhen searching by keyword. A keyword-only search is not supported. - Comment keyword search is slow: It uses Postgres full-text search server-side and may time out on wide date ranges. The scraper handles this automatically by splitting the range into smaller chunks and retrying. Setting
afterandbeforeis recommended for comment searches. - Very active subreddits/users: Keyword search may not work for extremely active subreddits or users. Try narrowing the date range.
- Rate limiting: The API is free. A few requests per second is fine. The scraper respects rate limit headers automatically.
The API returns max 100 results per request. The scraper paginates by using the timestamp of the last result as the starting point for the next request, repeating until all results are collected. Duplicates are filtered by ID.