Skip to content

arvndk/reddit_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reddit Scraper (Arctic Shift API)

Search Reddit posts and comments by keyword, subreddit, author, and date range using the Arctic Shift API. Results are saved as CSV files.

Setup

pip install -r requirements.txt

Usage

  1. Edit config.yaml with your search parameters
  2. Run:
python3 scraper.py

Or use a custom config file:

python3 scraper.py my_config.yaml

Results are saved to the output_dir folder (default ./results/). Files are named based on your search parameters, e.g.:

results/tesla__r_wallstreetbets__from_2024-01-01__to_2024-12-31_posts.csv
results/tesla__r_wallstreetbets__from_2024-01-01__to_2024-12-31_comments.csv

Config options

Option Description
keyword Word(s) to search for in post titles/body or comment text
subreddit Subreddit to search in (without r/ prefix)
author Filter by username (without u/ prefix)
after / before Date range, format YYYY-MM-DD or offsets like 1year, 30d
search_posts Search for matching posts (true/false)
search_comments Search for matching comments (true/false)
fetch_post_comments Also fetch all comments under each matched post (true/false)
output_dir Folder to save CSV results
delay_between_requests Seconds between API calls (default 0.5)

Keyword syntax

Syntax Meaning
word1 word2 Must contain both words (any order)
"word1 word2" Must appear in that exact sequence
word1 OR word2 Either word
word1 -word2 word1 but NOT word2

API limitations

  • Keyword search requires a filter: The Arctic Shift API requires at least a subreddit or author when searching by keyword. A keyword-only search is not supported.
  • Comment keyword search is slow: It uses Postgres full-text search server-side and may time out on wide date ranges. The scraper handles this automatically by splitting the range into smaller chunks and retrying. Setting after and before is recommended for comment searches.
  • Very active subreddits/users: Keyword search may not work for extremely active subreddits or users. Try narrowing the date range.
  • Rate limiting: The API is free. A few requests per second is fine. The scraper respects rate limit headers automatically.

How pagination works

The API returns max 100 results per request. The scraper paginates by using the timestamp of the last result as the starting point for the next request, repeating until all results are collected. Duplicates are filtered by ID.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages