Dagster Reddit ETL Project

This project uses Dagster to create a simple ETL (Extract, Transform, Load) pipeline that fetches new submissions from a specified subreddit using the Reddit API and stores them in a local SQLite database.

Overview

The pipeline consists of two main assets:

reddit_submissions: Extracts data from a subreddit, transforms it, and loads new submissions into a SQLite submissions table.
preview_top_submissions: A downstream asset that runs after reddit_submissions to display a preview of the 10 most recent posts.

The project is designed to be configurable, allowing you to easily change the target subreddit and the number of posts to fetch via a config.ini file.

Asset Graph Preview

Here is a preview of the asset graph in the Dagster UI, showing the dependency between the two assets.

Prerequisites

Python >=3.9, <3.13
A Reddit account with API credentials.
uv (a fast Python package installer and resolver).

Setup and Installation

Clone the Repository Start by cloning the project repository to your local machine.
```
git clone https://github.com/rohanvh7/Reddit-Analysis.git
cd Reddit-Analysis
```
Create a Virtual Environment It's highly recommended to use a virtual environment. uv can create one for you.
```
uv venv
source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`
```
Install Dependencies with uv With your virtual environment activated, use uv sync to install all required dependencies, including dagster, praw, and pandas, as defined in pyproject.toml. To include development dependencies (like pytest), use the --all-extras flag.
```
uv sync --all-extras
```
Configure Environment Variables This project uses a .env file to securely manage your Reddit API credentials. Create a file named .env in the root directory of the project.

Copy the following format into your .env file and replace the placeholder values with your actual Reddit credentials.
```
# .env file
REDDIT_CLIENT_ID=YOUR_CLIENT_ID_HERE
REDDIT_CLIENT_SECRET=YOUR_CLIENT_SECRET_HERE
REDDIT_USERNAME=YOUR_USERNAME_HERE
REDDIT_PASSWORD=YOUR_PASSWORD_HERE
REDDIT_USER_AGENT=MyDagsterApp/0.1 by u/YourUsername
```
Important: The .gitignore file is already configured to ignore .env, ensuring your secrets are not committed to version control. Make sure you don't have double quotes or <> around your credentials in the .env file.

How to Run the Project

With your virtual environment activated and your .env file configured, you can launch the Dagster UI.

Start the Dagster UI From your project's root directory, run the dagster dev command. Dagster will automatically find your code location based on the [tool.dagster] section of your pyproject.toml.
```
dagster dev
```
Access the UI Open your web browser and navigate to http://localhost:3000.
Materialize the Assets In the Dagster UI, you will see the asset graph. To run the full pipeline:
- Select the preview_top_submissions asset.
- Click the "Materialize" button. Dagster will automatically run the upstream reddit_submissions asset first.
Upon successful completion, a submissions.db file will be created in your project directory, and the run logs for the preview asset will display a table of the latest posts.

Credits

AI Assistance: Gemini 2.5 Pro
Reddit API Wrapper: Praw

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Reddit_Analysis		Reddit_Analysis
Reddit_Analysis_tests		Reddit_Analysis_tests
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dagster Reddit ETL Project

Overview

Asset Graph Preview

Prerequisites

Setup and Installation

How to Run the Project

Credits

About

Uh oh!

Languages

License

rohanvh7/Reddit-Analysis

Folders and files

Latest commit

History

Repository files navigation

Dagster Reddit ETL Project

Overview

Asset Graph Preview

Prerequisites

Setup and Installation

How to Run the Project

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages