Perplexity.ai Scraper Demo

A Python script that queries Perplexity.ai and extracts AI-generated responses using the ScrapingAnt API for browser rendering.

Features

Browser-based interaction with Perplexity.ai via ScrapingAnt API
Uses residential proxies to bypass Cloudflare protection
Direct search URL approach for reliable results
Configurable response wait time
HTML response saved for debugging
Command-line interface

Requirements

Python 3.7+
ScrapingAnt API key

Installation

Clone this repository or download the files

Create a virtual environment (recommended):

python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install requests beautifulsoup4
```
Add your ScrapingAnt API key to perplexity_scraper.py:
```
API_KEY = "<YOUR_SCRAPINGANT_API_KEY>"
```

Obtaining ScrapingAnt API Token

Go to ScrapingAnt
Click Sign Up or Get Started Free
Create an account using email or Google/GitHub OAuth
After registration, navigate to your Dashboard
Find your API key in the API Key section
Copy the API key and paste it into perplexity_scraper.py

Note: ScrapingAnt offers a free tier with limited API credits. This script uses residential proxies which may consume more credits. For production usage, consider upgrading to a paid plan.

Usage

Basic usage:

python perplexity_scraper.py "What is the capital of France?"

Custom wait time (in milliseconds):

python perplexity_scraper.py "Explain quantum computing" --wait 50000

All options:

python perplexity_scraper.py --help

Command Line Arguments

Argument	Description	Default
`query`	The question/prompt to send to Perplexity	Required
`--wait`	Wait time for response in ms	40000

Output

The script will:

Print the query and configuration
Save the raw HTML response to perplexity_response.html
Print the extracted Perplexity response

How It Works

Constructs a direct search URL: https://www.perplexity.ai/search?q=<query>
Sends request to ScrapingAnt API with browser rendering and residential proxies
Uses residential proxies to bypass Cloudflare protection
Waits for Perplexity to generate a response
Returns HTML which is parsed to extract the answer text

Troubleshooting

No response extracted: Increase --wait time for longer responses
API errors: Verify your API key is correct and has available credits
Empty responses: Check perplexity_response.html for debugging
Cloudflare blocks: The script uses residential proxies by default which should bypass most blocks

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
perplexity_scraper.py		perplexity_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Perplexity.ai Scraper Demo

Features

Requirements

Installation

Obtaining ScrapingAnt API Token

Usage

Basic usage:

Custom wait time (in milliseconds):

All options:

Command Line Arguments

Output

How It Works

Troubleshooting

License

About

Uh oh!

Languages

kami4ka/PerplexityExtractionDemo

Folders and files

Latest commit

History

Repository files navigation

Perplexity.ai Scraper Demo

Features

Requirements

Installation

Obtaining ScrapingAnt API Token

Usage

Basic usage:

Custom wait time (in milliseconds):

All options:

Command Line Arguments

Output

How It Works

Troubleshooting

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages