Skip to content

An end-to-end text summarization project that processes, analyzes, and generates concise summaries from long textual content using advanced algorithms.

License

Notifications You must be signed in to change notification settings

Adity-star/Text-Summarization-Project

Repository files navigation

Text Summarizer Project

An advanced Natural Language Processing (NLP) application designed to automatically generate concise summaries from large blocks of text. With the ever-growing amount of information online and in documents, this project provides an efficient solution to extract key insights from text quickly and accurately.

website


📖 Description

The Text-Summarizer-Project is a versatile system capable of summarizing articles, research papers, news reports, and other lengthy documents. It uses state-of-the-art NLP techniques, combining extractive and abstractive summarization methods to produce concise and meaningful summaries.

Key features include:

  • Text Preprocessing: Cleans input text, removes noise, punctuation, and stopwords.
  • Sentence Extraction: Identifies key sentences representing main ideas.
  • Semantic Understanding: Leverages semantic analysis to comprehend meaning and relevance.
  • Summarization Techniques: Supports both extractive and abstractive summarization.
  • Length Control: Users can adjust summary length (short or comprehensive).
  • User Interface: Simple interface for text input and summary output.

Benefits:

  • Time-saving by quickly condensing long texts.
  • Helps researchers, students, and professionals extract key insights.
  • Useful for journalists, content creators, and language learners.
  • Can be integrated into search engines or knowledge management systems.

📂 Dataset

SAMSum Dataset (Hugging Face Link)

  • 16k messenger-like conversations with human-written summaries.
  • Covers dialogues between 2+ speakers, varying in style (informal, semi-formal, formal) with slang, emoticons, and typos.
  • Training/Validation/Test split:
    • Train: 14,732
    • Validation: 818
    • Test: 819

Example Instance:

{
  "id": "13818513",
  "summary": "Amanda baked cookies and will bring Jerry some tomorrow.",
  "dialogue": "Amanda: I baked cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)"
}

Fields

  • dialogue: Text of conversation
  • summary: Human-written concise summary
  • id: Unique identifier

🧠 Model Information

PEGASUS (Google AI) – A state-of-the-art transformer-based model for abstractive summarization.

Key Features:

  • Transformer-based neural network
  • Trained on large datasets of text and code
  • Generates fluent and informative summaries
  • Outperforms other summarization models on various tasks

📝 Notes on Training

  • Initial training with 1 epoch due to low computing power
  • Achieved accuracy was low; further iterations are planned to improve performance

🚀 Key Features & Functionality

  • Preprocessing, cleaning, and noise removal
  • Extractive and abstractive summarization techniques
  • Semantic sentence ranking and selection
  • Adjustable summary length
  • Robust MLOps framework using MLflow and DVC
  • Deployment-ready FastAPI service with Docker and AWS integration

📈 Results

Metric Score
ROUGE-L 44.1
ROUGE-2 24.5
Baseline Δ +2

Outperforms standard baselines and demonstrates the effectiveness of hybrid PEGASUS-based summarization.


🛠️ Tech Stack

  • ML/DL: Hugging Face Transformers, PEGASUS
  • MLOps: MLflow, DVC, Docker
  • Backend/Deployment: FastAPI, AWS EC2, S3, ECR
  • CI/CD: GitHub Actions

💻 How to Run

  1. Clone the repository:
git clone https://github.com/praj2408/Text-Summarizer-Project.git
cd Text-Summarizer-Project
  1. create a conda environment.
conda create -n summary python==3.8 -y
conda activate summary
  1. Install dependencies.
pip install -r requirements.txt
  1. Run the FastAPI app locally.
python app.py
  1. Open your browser at http://localhost:8000 (or specified port) to interact with the service.

☁️ AWS CICD Deployment with GitHub Actions

1. AWS Setup:

  • Create IAM user with:

    • EC2 access
    • ECR (Elastic Container Registry) access
  • Assign policies:

    • AmazonEC2FullAccess
    • AmazonEC2ContainerRegistryFullAccess

2. ECR Deployment:

  • Create an ECR repository.
  • Build Docker image of the app:
docker build -t text-summarizer .
  • Push Docker image to ECR.

3. Ec2 Deployment.

  • Launch an EC2 instance
  • Install Docker on EC2
  curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
  • Pull the Docker Image.
  • Launch the Docker container on EC2.

GitHub Actions Integration

  • Configure secrets:

    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_REGION
    • AWS_ECR_LOGIN_URI
    • ECR_REPOSITORY_NAME
  • Automate deployment with CI/CD workflow.


🌟 Contributing

Contributions are welcome! Feel free to fork the repository, raise issues, and submit pull requests.


📝 License

This project is licensed under the MIT License.

About

An end-to-end text summarization project that processes, analyzes, and generates concise summaries from long textual content using advanced algorithms.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published