This is an AI-powered Streamlit app for summarizing Reddit posts and comparing the outputs of two transformer models: BART and T5-Large. It uses Google Gemini to generate a concise summary and highlight differences between the model-generated summaries.
-
To learn more about the Hugging Face pipeline,
BARTandT5-Largetransfomer models, you can follow along on Medium.com. -
The Reddit API was used to generate the data for this repo; you can read about it here.
- Upload CSVs with
BARTandT5-Largesummaries for Reddit posts - AI-generated summary & comparison using Google Gemini
- Visualize sentiment (with emoji) and number of comments (progress bar)
- Interactive UI built with Streamlit
- Batch processing of multiple posts
- Docker-ready for easy deployment
content_summarization/
│
├── app.py # Streamlit app entry point
├── summarize.py # Summarization logic and ContentSummary class
├── requirements.txt # Python dependencies
├── Dockerfile # Docker build instructions
├── .env.example # Example environment variables
├── README.md # Project documentation
│
├── data/ # Sample data
│
└── tests/ # (TODO) Unit tests
git clone https://github.com/gabya06/content_summarization.git
cd content_summarizationpip install -r requirements.txtCopy .env.example to .env and add your Google Gemini API key:
GEMINI_API_KEY=your-key-here
streamlit run app.py- Upload your BART and T5-Large CSV files in the sidebar.
- Select the number of posts to summarize.
- Click Summarize and Compare to view results.
Your CSVs should have at least these columns:
titlecleaned_textsummary_bart(for BART CSV)summary_t5(for T5-Large CSV)sentimentnum_comments
docker build -t content-summarization .
docker run -p 8501:8501 --env-file .env content-summarizationThen open http://localhost:8501 in your browser.
Check out the app in Google Cloud!
- Edit
summarize.pyto change prompt logic or add more models. - Tweak
app.pyfor UI changes or new visualizations.