Scrapi is a Vapor-based microservice that fetches and parses HTML URL and returns parsed JSON data accordingly to parameters passed.
Call POST /getList endpoint to get a JSON list.
Pass configuration JSON in POST body. Configuration JSON consists of 3 parts:
url- URL to be called and parsedlistSelector- CSS selector that matches list elements. Usually that is<tr>tag of a tableattributeSelectors- key-value pairs where value is an CSS selector. from each match inlistSelectora JSON object will be generated where keys will remain the same but values will be replaced with corresponding CSS selector results.
curl -X POST \
http://localhost:8080/getList \
-H 'Content-Type: application/json' \
-d '{
"url": "https://www.ss.com/lv/real-estate/flats/riga/all/",
"listSelector": "#filter_frm table[align=center] tr",
"attributeSelectors": {
"url": ":nth-child(1) a | href",
"location": ":nth-child(4)",
"price": ":nth-child(9)"
}
}'
[
{
"url": "/msg/lv/real-estate/flats/riga/centre/akljo.html",
"location": "centrs Pērnavas 20",
"price": "35 €/dienā"
},
{
"url": "/msg/lv/real-estate/flats/riga/imanta/gedhl.html",
"location": "Imanta Kurzemes pr. 24",
"price": "240 €/mēn."
},
...
]Call POST /getObject endpoint to get JSON object.
Pass configuration JSON in POST body. Configuration JSON consists of 3 parts:
url- URL to be called and parsedattributeSelectors- key-value pairs where value is an CSS selector. from each match inlistSelectora JSON object will be generated where keys will remain the same but values will be replaced with corresponding CSS selector results.
curl -X POST \
http://localhost:8080/getObject \
-H 'Content-Type: application/json' \
-d '{
"url": "https://www.ss.com/msg/lv/real-estate/flats/riga/centre/emfkl.html",
"attributeSelectors": {
"description": "#msg_div_msg ",
"price": ".ads_price",
"date": "#page_main td[valign=bottom] tbody tr:nth-child(2) td[align=right]",
"meta": "#msg_div_msg .options_list"
}
}
'
{
"meta": "Pilsēta: Rīga Rajons: centrs Iela: Hospitāļu 23 Ērtības: Visas ērtības",
"price": "700 €/mēn. (8.24 €/m²)",
"description": "Сдается полностью меблированная квартира с хорошей планировкой и высококачественной отделкой в престижном проекте \"Шоколад\". Просторная квартира(85 кв. м) находится на 4 этаже дома с лифтом, наблюдением(консьерж) , в деловом центре города, с хорошей инфраструктурой. Оборудована всей необходимой техникой(холодильник, плита, духовка, стиральная, посудомоечная машины, телевизор). Планировка: гостиная-студия, 2 спальни, прихожая с гардеробом, 2 санузла(ванная комната с душевой кабиной и туалетом; гостевой туалет). Pilsēta: Rīga Rajons: centrs Iela: Hospitāļu 23 [Karte] Istabas: 3 Platība: 85 Stāvs: 4/7 Sērija: Jaun. Mājas tips: Paneļu - ķieģeļu Ērtības: Visas ērtības Cena: 700 €/mēn. (8.24 €/m²)",
"date": "Datums: 07.08.2019 20:03"
}By default the value is the text representation of tag contents, but it can be altered to html representation or a specific tag.
For example img.my_image | src will find img tag and will return the value of src attribute. div#main_contents | html will return div tag's HTML contents, but div#main_contents | text or just div#main_contents will return plain text contents.
- Install Vapor: https://docs.vapor.codes/3.0/install/macos/
- Run the server on localhost:
vapor run