A simple amazon scraper to extract product details and prices from Amazon.com using Python Requests and Selectorlib.
There are two simple scrapers in this project.
- Amazon Product Page Scraper
amazon.py - Amazon Search Results Page Scraper
searchresults.py
From a terminal
- Clone this project
git clone https://github.com/scrapehero-code/amazon-scraper.gitand cd into itcd amazon-scraper - Add a Virtual Environment
python3 -m venv .venv(Optional) - Activate the Virtual Environment
source .venv/bin/activate(Optional) - Install Requirements
pip3 install -r requirements.txt
- Add Amazon Product URLS to urls.txt
- Run
python3 amazon.py - Get data from output/product.jsonl
This scraper only scrapes product from the first page of search results
- Add Amazon Product URLS to search_results_urls.txt
- Run
python3 searchresults.pyorpython3 searchresults.py -removeAdsto run and not include the ads - Get data from output/search_results_output.jsonl
Check the output readme
- I am seeing \u* before my outputs(for example in price)
This is a unicode symbol. For example: \u00a3 is a UK pound sign, so \u00a3250.00 would be £250 if you encoded the unicode character.
- The URL output from searchresults.py is not a full URL
Add https://www.amazon.co.uk in front of it. (Or whatever amazon region you want to scrape, this example goes to .co.uk)