A tiny Python CLI that fetches the latest Guardian articles via their open API and publishes them to an AWS Kinesis stream (useful for data-platform demos, marketing feeds, etc.).
- Search Guardian Content API by keyword + optional date filter
- Transform response to lightweight JSON schema
- Publish up to 10 newest articles to Kinesis (JSON batch)
- content_preview - first 1000 chars of article body included
- Unit-tested, PEP-8 compliant, no secrets in code
- Terraform build included
- Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh - Clone & enter repo
git clone https://github.com/brxzee/guardian-streamer.git cd guardian-streamer - Add Guardian API key (never commit)
echo "GUARDIAN_API_KEY=your_key_here" > .env - Setup AWS credentials (SSO example)
aws sso login --profile admin-sso # one-time browser login export AWS_PROFILE=admin-sso # or add to .env: AWS_PROFILE=admin-sso - Deploy AWS with Terraform
terraform init
terraform apply -auto-approve
- Install dependencies & run
uv sync uv run python main.py "machine learning" --date-from 2023-01-01
Output:
{
"record_sequence": "49622369389484470503062566976986553315237083692457033746",
"article_count": 10,
"article_titles": [
"Machine learning model spots Covid variants...",
"..."
]
}
uv run pytest -v
| Variable | Purpose | Default |
|---|---|---|
GUARDIAN_API_KEY |
Guardian open-platform key | required |
AWS_PROFILE |
AWS SSO profile (or keys) | None |
KINESIS_STREAM_NAME |
Target Kinesis stream | guardian_content |