Skip to content

Brxzee/guardian-streamer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

guardian-streamer

A tiny Python CLI that fetches the latest Guardian articles via their open API and publishes them to an AWS Kinesis stream (useful for data-platform demos, marketing feeds, etc.).

Features

  • Search Guardian Content API by keyword + optional date filter
  • Transform response to lightweight JSON schema
  • Publish up to 10 newest articles to Kinesis (JSON batch)
  • content_preview - first 1000 chars of article body included
  • Unit-tested, PEP-8 compliant, no secrets in code
  • Terraform build included

Quick start

  1. Install uv
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    
  2. Clone & enter repo
    git clone https://github.com/brxzee/guardian-streamer.git
    cd guardian-streamer 
    
    
  3. Add Guardian API key (never commit)
    echo "GUARDIAN_API_KEY=your_key_here" > .env
    
    
  4. Setup AWS credentials (SSO example)
    aws sso login --profile admin-sso       # one-time browser login
    export AWS_PROFILE=admin-sso            # or add to .env: AWS_PROFILE=admin-sso
    
    
  5. Deploy AWS with Terraform
terraform init
terraform apply -auto-approve
  1. Install dependencies & run
    uv sync
    uv run python main.py "machine learning" --date-from 2023-01-01
    

Output:

{
    "record_sequence": "49622369389484470503062566976986553315237083692457033746",
    "article_count": 10,
    "article_titles": [
        "Machine learning model spots Covid variants...",
        "..."
    ]
}

Run test suite

uv run pytest -v
Variable Purpose Default
GUARDIAN_API_KEY Guardian open-platform key required
AWS_PROFILE AWS SSO profile (or keys) None
KINESIS_STREAM_NAME Target Kinesis stream guardian_content

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published