Skip to content

shanirosen/substack-summarizer

Repository files navigation

Substack Newsletter Summarizer

An automated tool that collects the latest posts from selected Substack newsletters, generates AI-powered summaries, and delivers them as a PDF report via email. Perfect for staying up-to-date with your favorite newsletters without the time commitment.

Features

  • 📰 Automated Newsletter Collection: Fetches the latest posts from configured Substack newsletters
  • 🤖 AI-Powered Summaries: Uses OpenAI/Anthropic models to generate concise bullet-point summaries
  • 📄 PDF Report Generation: Creates professional PDF reports with all summaries
  • 📧 Email Delivery: Automatically sends reports via email
  • ☁️ Cloud-Ready: Deployable as Google Cloud Functions with scheduled execution
  • 🔐 Secure: Uses Google Cloud Secret Manager for API keys and sensitive data

Currently Monitored Newsletters

Architecture

The project consists of several key components:

  • main.py: Cloud Function entry points (HTTP and scheduled triggers)
  • substack_summerizer/main.py: Core orchestration logic
  • post_handler.py: Handles individual post processing and metadata extraction
  • content_extractor.py: AI-powered content summarization
  • generate_pdf_report.py: PDF report generation
  • email_sender.py: Email delivery functionality
  • gcp_secrets.py: Google Cloud Secret Manager integration

Setup

Prerequisites

  • Python 3.12+
  • Google Cloud Platform account (for deployment)
  • Groq Anthropic API key
  • Email credentials for report delivery

Local Development

  1. Clone the repository

    git clone <repository-url>
    cd substack-summarizer
  2. Install dependencies

    pip install -r requirements.txt
    # or with uv (recommended)
    uv sync
  3. Set up environment variables Create a .env file or set the following environment variables:

    GROQ_API_KEY=api_key
    # Email configuration
    EMAIL_HOST=smtp.gmail.com
    EMAIL_PORT=587
    EMAIL_USER=your_email@gmail.com
    EMAIL_PASSWORD=your_app_password
  4. Run locally

    uv run python -m substack_summerizer.main.py

Google Cloud Deployment

  1. Configure the deployment script Edit deploy_cloud_function.sh to set your project ID and preferences:

    PROJECT_ID="your-gcp-project-id"
    FUNCTION_NAME="newsletter-summarizer"
    REGION="us-central1"
  2. Set up Google Cloud Secret Manager Store your API keys and email credentials in Secret Manager:

    gcloud secrets create openai-api-key --data-file=path/to/key
    gcloud secrets create email-password --data-file=path/to/password
    # etc.
  3. Deploy

    chmod +x deploy_cloud_function.sh
    ./deploy_cloud_function.sh

The deployment creates:

  • A scheduled Cloud Function (runs weekly on Mondays at 9 AM EST)
  • An HTTP-triggered Cloud Function (for manual execution)
  • A Cloud Scheduler job for automation

Configuration

Adding New Newsletters

Edit substack_summerizer/consts.py and add URLs to the SUBSTACK_NEWSLETTERS_URLS list:

SUBSTACK_NEWSLETTERS_URLS = [
    "https://newsletter.pragmaticengineer.com",
    "https://your-new-newsletter.substack.com",
    # Add more URLs here
]

Call the HTTP Cloud Function:

curl -X POST https://your-region-your-project.cloudfunctions.net/newsletter-summarizer-http

Scheduled Execution

The Cloud Scheduler automatically triggers the function weekly. You can modify the schedule in deploy_cloud_function.sh:

--schedule="0 9 * * 1"  # Every Monday at 9 AM

Project Structure

substack-summarizer/
├── main.py                          # Cloud Function entry points
├── requirements.txt                 # Python dependencies
├── pyproject.toml                   # Project configuration
├── deploy_cloud_function.sh         # Deployment script
├── substack_summerizer/
│   ├── main.py                      # Core application logic
│   ├── consts.py                    # Configuration constants
│   ├── content_extractor.py         # AI summarization
│   ├── email_sender.py              # Email delivery
│   ├── gcp_secrets.py               # Secret management
│   ├── generate_pdf_report.py       # PDF generation
│   └── post_handler.py              # Post processing
└── README.md                        # This file

About

Collects data from substack newletters, summrizes them and sends an email report.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •