WebScraper

Overview

The WebScraper project scrapes Craigslist for specific ticket listings, such as ACL (Austin City Limits) tickets. It filters listings based on search parameters and sends email notifications when new posts are found. The scraper uses Selenium for web scraping and Gmail OAuth2 for sending email notifications.

Features

Scrapes Craigslist for ACL ticket listings.
Filters listings based on criteria such as price and keywords.
Sends email notifications for new listings.
Automates the process using GitHub Actions to run every 15 minutes.

Tech Stack

Python: Main language for web scraping and email notifications.
Selenium: Interacts with Craigslist and scrapes listings.
BeautifulSoup: Parses HTML content of Craigslist pages.
Loguru: Logging library for better debugging.
Gmail API: Sends email notifications via OAuth2.
GitHub Actions: Automates the scraping process every 15 minutes.

Setup

Prerequisites

Python: Download Python
ChromeDriver: Download ChromeDriver and place it in C:\WebDriver\ or update the path in the script. Download ChromeDriver
Git: Download Git

Installation

Clone the Repository:

git clone https://github.com/bradyespey/WebScaper.git
cd WebScaper

Install Dependencies:

Create a virtual environment (optional but recommended):
```
python -m venv venv
source venv/Scripts/activate  # On Windows
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Set Up Environment Variables:

Create a .env file in the root of the project and add the following:

BASE_URL=https://austin.craigslist.org/search/sss?hasPic=1&max_price=350&min_price=200&query=acl%20one%20-saturday%20-friday%20-sunday
GMAIL_USER=your_gmail@gmail.com
GMAIL_CLIENT_ID=your_client_id
GMAIL_CLIENT_SECRET=your_client_secret
GMAIL_REFRESH_TOKEN=your_refresh_token
TOKEN_URI=https://oauth2.googleapis.com/token
FROM_EMAIL=your_gmail@gmail.com
TO_EMAIL=recipient_email@gmail.com
SEEN_POSTS=src/posts.txt

Run the Scraper:
```
python webscraper.py
```
This will run the scraper and output new listings.

Automation with GitHub Actions

This project is automated using GitHub Actions to run the scraper every 15 minutes and send email notifications.

Setting Up GitHub Secrets

To securely store your Gmail credentials:

Go to your GitHub repository.
Navigate to Settings > Secrets and variables > Actions > New repository secret.
Add the following secrets:
- GMAIL_USER
- GMAIL_CLIENT_ID
- GMAIL_CLIENT_SECRET
- GMAIL_REFRESH_TOKEN
- TOKEN_URI
- FROM_EMAIL
- TO_EMAIL

GitHub Actions Workflow

The GitHub Actions workflow is defined in .github/workflows/webscraper.yml. It runs every 15 minutes, executes the scraper, and sends emails when new listings are found.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scraped_results.txt		scraped_results.txt
webscraper.py		webscraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebScraper

Overview

Features

Tech Stack

Setup

Prerequisites

Installation

Automation with GitHub Actions

Setting Up GitHub Secrets

GitHub Actions Workflow

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

bradyespey/web-scraper

Folders and files

Latest commit

History

Repository files navigation

WebScraper

Overview

Features

Tech Stack

Setup

Prerequisites

Installation

Automation with GitHub Actions

Setting Up GitHub Secrets

GitHub Actions Workflow

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages