Skip to content

kami4ka/PerplexityExtractionDemo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Perplexity.ai Scraper Demo

A Python script that queries Perplexity.ai and extracts AI-generated responses using the ScrapingAnt API for browser rendering.

Features

  • Browser-based interaction with Perplexity.ai via ScrapingAnt API
  • Uses residential proxies to bypass Cloudflare protection
  • Direct search URL approach for reliable results
  • Configurable response wait time
  • HTML response saved for debugging
  • Command-line interface

Requirements

  • Python 3.7+
  • ScrapingAnt API key

Installation

  1. Clone this repository or download the files

  2. Create a virtual environment (recommended):

    python3 -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    pip install requests beautifulsoup4
  4. Add your ScrapingAnt API key to perplexity_scraper.py:

    API_KEY = "<YOUR_SCRAPINGANT_API_KEY>"

Obtaining ScrapingAnt API Token

  1. Go to ScrapingAnt
  2. Click Sign Up or Get Started Free
  3. Create an account using email or Google/GitHub OAuth
  4. After registration, navigate to your Dashboard
  5. Find your API key in the API Key section
  6. Copy the API key and paste it into perplexity_scraper.py

Note: ScrapingAnt offers a free tier with limited API credits. This script uses residential proxies which may consume more credits. For production usage, consider upgrading to a paid plan.

Usage

Basic usage:

python perplexity_scraper.py "What is the capital of France?"

Custom wait time (in milliseconds):

python perplexity_scraper.py "Explain quantum computing" --wait 50000

All options:

python perplexity_scraper.py --help

Command Line Arguments

Argument Description Default
query The question/prompt to send to Perplexity Required
--wait Wait time for response in ms 40000

Output

The script will:

  1. Print the query and configuration
  2. Save the raw HTML response to perplexity_response.html
  3. Print the extracted Perplexity response

How It Works

  1. Constructs a direct search URL: https://www.perplexity.ai/search?q=<query>
  2. Sends request to ScrapingAnt API with browser rendering and residential proxies
  3. Uses residential proxies to bypass Cloudflare protection
  4. Waits for Perplexity to generate a response
  5. Returns HTML which is parsed to extract the answer text

Troubleshooting

  • No response extracted: Increase --wait time for longer responses
  • API errors: Verify your API key is correct and has available credits
  • Empty responses: Check perplexity_response.html for debugging
  • Cloudflare blocks: The script uses residential proxies by default which should bypass most blocks

License

MIT License

About

Perplexity.ai scraper using ScrapingAnt API with browser rendering

Topics

Resources

Stars

Watchers

Forks

Languages