Reddit Scraper (Arctic Shift API)

Search Reddit posts and comments by keyword, subreddit, author, and date range using the Arctic Shift API. Results are saved as CSV files.

Setup

pip install -r requirements.txt

Usage

Edit config.yaml with your search parameters
Run:

python3 scraper.py

Or use a custom config file:

python3 scraper.py my_config.yaml

Results are saved to the output_dir folder (default ./results/). Files are named based on your search parameters, e.g.:

results/tesla__r_wallstreetbets__from_2024-01-01__to_2024-12-31_posts.csv
results/tesla__r_wallstreetbets__from_2024-01-01__to_2024-12-31_comments.csv

Config options

Option	Description
`keyword`	Word(s) to search for in post titles/body or comment text
`subreddit`	Subreddit to search in (without `r/` prefix)
`author`	Filter by username (without `u/` prefix)
`after` / `before`	Date range, format `YYYY-MM-DD` or offsets like `1year`, `30d`
`search_posts`	Search for matching posts (`true`/`false`)
`search_comments`	Search for matching comments (`true`/`false`)
`fetch_post_comments`	Also fetch all comments under each matched post (`true`/`false`)
`output_dir`	Folder to save CSV results
`delay_between_requests`	Seconds between API calls (default `0.5`)

Keyword syntax

Syntax	Meaning
`word1 word2`	Must contain both words (any order)
`"word1 word2"`	Must appear in that exact sequence
`word1 OR word2`	Either word
`word1 -word2`	word1 but NOT word2

API limitations

Keyword search requires a filter: The Arctic Shift API requires at least a subreddit or author when searching by keyword. A keyword-only search is not supported.
Comment keyword search is slow: It uses Postgres full-text search server-side and may time out on wide date ranges. The scraper handles this automatically by splitting the range into smaller chunks and retrying. Setting after and before is recommended for comment searches.
Very active subreddits/users: Keyword search may not work for extremely active subreddits or users. Try narrowing the date range.
Rate limiting: The API is free. A few requests per second is fine. The scraper respects rate limit headers automatically.

How pagination works

The API returns max 100 results per request. The scraper paginates by using the timestamp of the last result as the starting point for the next request, repeating until all results are collected. Duplicates are filtered by ID.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Scraper (Arctic Shift API)

Setup

Usage

Config options

Keyword syntax

API limitations

How pagination works

About

Uh oh!

Releases

Packages

Languages

arvndk/reddit_scraper

Folders and files

Latest commit

History

Repository files navigation

Reddit Scraper (Arctic Shift API)

Setup

Usage

Config options

Keyword syntax

API limitations

How pagination works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages