A Node.js application for scraping business information from YellowPages.com (US only). Available with both command-line and web interfaces, now featuring Firebase authentication and clean URL routing.
- User Authentication:
- Firebase email/password login
- Password reset functionality
- Search Capabilities:
- Search for businesses by type and location
- Collect information like:
- Business names
- Phone numbers
- Websites
- Complete addresses
- Data Management:
- Save results as JSON or CSV files
- Browse, preview, and manage saved results
- Mobile-friendly web interface
- Command-line interface for scripts and automation
- Clone the repository:
git clone git@github.com:DevManSam777/yp-scraper-docker.git
cd yp-webscraper-docker- Install dependencies:
npm install- Make sure the output directories exist:
mkdir -p json_results csv_results- Clone the repository:
git clone git@github.com:DevManSam777/yp-scraper-docker.git
cd yp-webscraper-docker- Run using Docker Compose:
docker-compose upThis will:
- Build the Docker image with all dependencies (including Chrome)
- Create and start the container
- Mount the necessary volumes for file storage
- Map port 3000 to the container
- Create a Firebase project at console.firebase.google.com
- Enable Email/Password authentication
- Register a web app in your Firebase project
- Add users manually from firebase console since we don't want sign ups via the web app
- Update the Firebase configuration (public keys) in:
public/login.htmlpublic/index.html(logout functionality)
- Add your development and production domains to Firebase authorized domains
Run the scraper in interactive mode:
npm run searchYou'll be prompted to enter:
- What you're looking for (e.g., "pizza")
- Where (e.g., "Los Angeles, CA")
- Number of results to collect
- How to save the results (JSON or CSV)
- Start the web server:
npm start- Open your browser and go to:
http://localhost:3000
-
Log in with your Firebase credentials
-
Use the interface to:
- Configure and start searches
- Monitor real-time progress
- View and manage results
- Preview and download files
The application can be deployed to any platform that supports Docker containers:
- Push your code to a Git repository
- Deploy the Docker container to your preferred hosting platform
- Add your deployment domain to Firebase authorized domains
When deployed, consider your file storage strategy:
- Docker containers typically use ephemeral storage
- Files will be lost during container restarts or redeployments
- For production, consider:
- Downloading files immediately after generation
- Mounting persistent volumes
- Using cloud storage integrations
The web interface provides:
- Clean URL Routing: User-friendly URLs without .html extensions:
/login- Authentication page/- Main application (protected)
- Login Screen: Secure access to the application
- Search Tab: Configure and run searches
- Results Tab: View detailed business information
- Files Tab: Manage saved JSON and CSV files
- Real-time Progress: Monitor search status
- File Preview: Quick view of saved results
- Responsive Design: Works on mobile devices
[
{
"businessName": "Pizza Place",
"businessType": "Pizza, Italian Restaurant",
"phone": "(555)123-4567",
"website": "https://example.com",
"streetAddress": "123 Main St",
"city": "Los Angeles",
"state": "CA",
"zipCode": "90001"
}
]Results are saved with the following columns:
- Business Name
- Business Type
- Phone
- Website
- Street Address
- City
- State
- ZIP Code
= Neither this application nor it's creator are affiliated in any way, shape, or form with yellowpages.com
- For educational and demonstration purposes only
- Only works with YellowPages.com
- Please refer to YellowPages.com Terms of Service before using
- Might break if the website structure changes
- Use carefully and responsibly
- Use at your own discretion and risk
- Limited to ~1000 results per search
- Rotating proxies recommended for extensive use
- Consider file storage persistence for production deployments
- Port: Change the web server port by setting the
PORTenvironment variable - Results Limit: Modify the maximum results in
puppeteer-scraper-module.jsand adjust UI values in public/app/index.html - URL Routing: The application uses clean URL paths without file extensions
The scraper uses Puppeteer with stealth plugins to navigate YellowPages search results and extract business information. The application architecture includes:
puppeteer-scraper-module.js: Core scraper functionalitypuppeteer-scraper-cli.js: Command-line interfaceweb-server.js: Web server & API endpoints with clean URL routingpublic/index.html: Web interfacepublic/login.html: Authentication interface

