Basic setup with random user agents and proxy addresses for Python Scrapy Framework.
####Setup
- Install Scrapy Framework
pip install Scrapy
Detailed installation guide 2. Install Beautiful Soup 4
pip install beautifulsoup4
####Usage To see what it does just:
python run.py
Project contains two middleware classes in middlewares.py. ProxyMiddleware downloads IP proxy addresses and before every process request chooses one randomly. RandomUserAgentMiddleware is similar, downloads user agent strings and saves them into 'USER_AGENT_LIST' settings list. It also before every process request selects one randomly. Middlewares are activated in settings.py file.
This project also contains two spiders just for testin purposes, spiders/iptester.py and spiders/uatester.py. You can run them individually:
scrapy crawl UAtester
scrapy crawl IPtester
run.py file is a also good example how to include and run your spiders sequentially from one script.
If you have any question or problem, feel free to ask me via email.