The WebScraper project scrapes Craigslist for specific ticket listings, such as ACL (Austin City Limits) tickets. It filters listings based on search parameters and sends email notifications when new posts are found. The scraper uses Selenium for web scraping and Gmail OAuth2 for sending email notifications.
- Scrapes Craigslist for ACL ticket listings.
- Filters listings based on criteria such as price and keywords.
- Sends email notifications for new listings.
- Automates the process using GitHub Actions to run every 15 minutes.
- Python: Main language for web scraping and email notifications.
- Selenium: Interacts with Craigslist and scrapes listings.
- BeautifulSoup: Parses HTML content of Craigslist pages.
- Loguru: Logging library for better debugging.
- Gmail API: Sends email notifications via OAuth2.
- GitHub Actions: Automates the scraping process every 15 minutes.
- Python: Download Python
- ChromeDriver: Download ChromeDriver and place it in
C:\WebDriver\or update the path in the script. Download ChromeDriver - Git: Download Git
-
Clone the Repository:
git clone https://github.com/bradyespey/WebScaper.git cd WebScaper -
Install Dependencies:
Create a virtual environment (optional but recommended):
python -m venv venv source venv/Scripts/activate # On Windows
Install the required dependencies:
pip install -r requirements.txt
-
Set Up Environment Variables:
Create a
.envfile in the root of the project and add the following:BASE_URL=https://austin.craigslist.org/search/sss?hasPic=1&max_price=350&min_price=200&query=acl%20one%20-saturday%20-friday%20-sunday GMAIL_USER=your_gmail@gmail.com GMAIL_CLIENT_ID=your_client_id GMAIL_CLIENT_SECRET=your_client_secret GMAIL_REFRESH_TOKEN=your_refresh_token TOKEN_URI=https://oauth2.googleapis.com/token FROM_EMAIL=your_gmail@gmail.com TO_EMAIL=recipient_email@gmail.com SEEN_POSTS=src/posts.txt
-
Run the Scraper:
python webscraper.py
This will run the scraper and output new listings.
This project is automated using GitHub Actions to run the scraper every 15 minutes and send email notifications.
To securely store your Gmail credentials:
- Go to your GitHub repository.
- Navigate to Settings > Secrets and variables > Actions > New repository secret.
- Add the following secrets:
GMAIL_USERGMAIL_CLIENT_IDGMAIL_CLIENT_SECRETGMAIL_REFRESH_TOKENTOKEN_URIFROM_EMAILTO_EMAIL
The GitHub Actions workflow is defined in .github/workflows/webscraper.yml. It runs every 15 minutes, executes the scraper, and sends emails when new listings are found.
This project is licensed under the MIT License. See the LICENSE file for details.