A spider for crawling images on the website.
# clone git repository
git clone https://github.com/luofei2011/image-spider.git
cd image-spider
# install nodejs packages
npm install
# add test.js
touch test.js
vim test.js
# insert
var Spider = require('./spider');
var spider = new Spider('http://poised-flw.com', {
level: 3,
maxSockets: 4,
downloadImage: true
});
spider.start();
# save & quit
# then. excute this file
node test.js
useAgent: the ua of spider.
maxSockets: the concurrent number of spider.
level: the crawling depth of spider.
onlyHost: whether the spider only crawl the same domain website, default true.
downloadImage: whether download the images, when crawling. default false.
-
The images src will be written to
$(pwd)/log/images_log. you can download them usedownload.sh, or setdownloadImage: true. -
You can expand this tool to deal with js/css/html etc. files.
If there has any problem, Please let me know. thanks~
You can only use this for learning nodejs.