Web Scraper with PDF Generation

A simple Python web scraper that extracts important information from a webpage and converts it to a PDF file.

Features

Takes a URL from the user
Scrapes the webpage for title, paragraphs, and images
Saves the data to a text file
Converts the data to a PDF file
Implements error handling with try-except-else blocks
Includes file handling operations
Creates unique PDF files with timestamps
Handles long titles in PDFs by adjusting font size or breaking into multiple lines

Requirements

Python 3.6+
Required packages:
- requests
- beautifulsoup4
- fpdf

Installation

Clone this repository or download the files
Install the required packages:

pip install requests beautifulsoup4 fpdf

Usage

Run the script:

python web_scraper.py

When prompted, enter the complete URL of the website you want to scrape (including http:// or https://).

The script will:

Scrape the website
Save the content to a unique text file (e.g., scraped_data_20240615_123045.txt)
Generate a unique PDF file (e.g., scraped_data_20240615_123045.pdf)

Each time you run the script, it will create new files rather than overwriting existing ones.

Error Handling

The script includes comprehensive error handling for:

Network connection issues
Invalid URLs
File I/O operations
PDF generation

PDF Improvements

The PDF generation has been improved to:

Prevent titles from extending beyond the page width
Adjust the font size for very long titles
Break exceptionally long titles into multiple lines

Note

This is a simple web scraper for educational purposes. Some websites may have measures to prevent scraping or might have complex structures that this basic scraper cannot handle effectively.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
web_scraper.py		web_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraper with PDF Generation

Features

Requirements

Installation

Usage

Error Handling

PDF Improvements

Note

About

Uh oh!

Releases

Packages

Languages

kaival775/web-scrapper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper with PDF Generation

Features

Requirements

Installation

Usage

Error Handling

PDF Improvements

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages