YP Scraper

A Node.js application for scraping business information from YellowPages.com (US only). Available with both command-line and web interfaces, now featuring Firebase authentication and clean URL routing.

Features

User Authentication:
- Firebase email/password login
- Password reset functionality
Search Capabilities:
- Search for businesses by type and location
- Collect information like:
  - Business names
  - Phone numbers
  - Websites
  - Complete addresses
Data Management:
- Save results as JSON or CSV files
- Browse, preview, and manage saved results
- Mobile-friendly web interface
- Command-line interface for scripts and automation

Installation

Standard Installation

Clone the repository:

git clone git@github.com:DevManSam777/yp-scraper-docker.git
cd yp-webscraper-docker

Install dependencies:

npm install

Make sure the output directories exist:

mkdir -p json_results csv_results

Docker Installation (From Docker Desktop)

Clone the repository:

git clone git@github.com:DevManSam777/yp-scraper-docker.git
cd yp-webscraper-docker

Run using Docker Compose:

docker-compose up

This will:

Build the Docker image with all dependencies (including Chrome)
Create and start the container
Mount the necessary volumes for file storage
Map port 3000 to the container

Firebase Authentication Setup

Create a Firebase project at console.firebase.google.com
Enable Email/Password authentication
Register a web app in your Firebase project
- Add users manually from firebase console since we don't want sign ups via the web app
Update the Firebase configuration (public keys) in:
- public/login.html
- public/index.html (logout functionality)

Add your development and production domains to Firebase authorized domains

Usage

Command-line Interface

Run the scraper in interactive mode:

npm run search

You'll be prompted to enter:

What you're looking for (e.g., "pizza")
Where (e.g., "Los Angeles, CA")
Number of results to collect
How to save the results (JSON or CSV)

Web Interface

Start the web server:

npm start

Open your browser and go to:

http://localhost:3000

Log in with your Firebase credentials
Use the interface to:
- Configure and start searches
- Monitor real-time progress
- View and manage results
- Preview and download files

Deployment

The application can be deployed to any platform that supports Docker containers:

Push your code to a Git repository
Deploy the Docker container to your preferred hosting platform
Add your deployment domain to Firebase authorized domains

File Storage Note

When deployed, consider your file storage strategy:

Docker containers typically use ephemeral storage
Files will be lost during container restarts or redeployments
For production, consider:
- Downloading files immediately after generation
- Mounting persistent volumes
- Using cloud storage integrations

Web Interface Features

The web interface provides:

Clean URL Routing: User-friendly URLs without .html extensions:
- /login - Authentication page
- / - Main application (protected)
Login Screen: Secure access to the application
Search Tab: Configure and run searches
Results Tab: View detailed business information
Files Tab: Manage saved JSON and CSV files
Real-time Progress: Monitor search status
File Preview: Quick view of saved results
Responsive Design: Works on mobile devices

Output Format

JSON Example

[
  {
    "businessName": "Pizza Place",
    "businessType": "Pizza, Italian Restaurant",
    "phone": "(555)123-4567",
    "website": "https://example.com",
    "streetAddress": "123 Main St",
    "city": "Los Angeles",
    "state": "CA",
    "zipCode": "90001"
  }
]

CSV Format

Results are saved with the following columns:

Business Name
Business Type
Phone
Website
Street Address
City
State
ZIP Code

Important Notes

= Neither this application nor it's creator are affiliated in any way, shape, or form with yellowpages.com

For educational and demonstration purposes only
Only works with YellowPages.com
Please refer to YellowPages.com Terms of Service before using
Might break if the website structure changes
Use carefully and responsibly
Use at your own discretion and risk

Limitations

Limited to ~1000 results per search
Rotating proxies recommended for extensive use
Consider file storage persistence for production deployments

Customization

Port: Change the web server port by setting the PORT environment variable
Results Limit: Modify the maximum results in puppeteer-scraper-module.js and adjust UI values in public/app/index.html
URL Routing: The application uses clean URL paths without file extensions

How It Works

The scraper uses Puppeteer with stealth plugins to navigate YellowPages search results and extract business information. The application architecture includes:

puppeteer-scraper-module.js: Core scraper functionality
puppeteer-scraper-cli.js: Command-line interface
web-server.js: Web server & API endpoints with clean URL routing
public/index.html: Web interface
public/login.html: Authentication interface

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
csv_results		csv_results
json_results		json_results
public		public
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
database.js		database.js
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
puppeteer-scraper-cli.js		puppeteer-scraper-cli.js
puppeteer-scraper-module.js		puppeteer-scraper-module.js
web-server.js		web-server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YP Scraper

Features

Installation

Standard Installation

Docker Installation (From Docker Desktop)

Firebase Authentication Setup

Usage

Command-line Interface

Web Interface

Deployment

File Storage Note

Web Interface Features

Output Format

JSON Example

CSV Format

Important Notes

Limitations

Customization

How It Works

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DevManSam777/yp-scraper

Folders and files

Latest commit

History

Repository files navigation

YP Scraper

Features

Installation

Standard Installation

Docker Installation (From Docker Desktop)

Firebase Authentication Setup

Usage

Command-line Interface

Web Interface

Deployment

File Storage Note

Web Interface Features

Output Format

JSON Example

CSV Format

Important Notes

Limitations

Customization

How It Works

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages