Project was created for "SKNS Warsztaty z Pythona".
Consists crawler for scraping real estate data from gumtree and jupyter notebook with ML.
To clone repository type:
git clone https://github.com/Santhin/real-estate
To run crawler locally:
pip install -r requirements
python app.py
.
βββ crawler
βΒ Β βββ app.py
βΒ Β βββ aps_asyncio.py
βΒ Β βββ gumtree
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ items.py
βΒ Β βΒ Β βββ middlewares.py
βΒ Β βΒ Β βββ pipelines.py
βΒ Β βΒ Β βββ settings.py
βΒ Β βΒ Β βββ spiders
βΒ Β βΒ Β βββ gumtree_crawler.py
βΒ Β βΒ Β βββ __init__.py
βΒ Β βΒ Β βββ stack.py
βΒ Β βββ install_asyncio.py
βΒ Β βββ Procfile
βΒ Β βββ requirements.txt
βΒ Β βββ scrapy.cfg
βββ LICENSE
βββ ml
βΒ Β βββ features
βΒ Β βΒ Β βββ rankingcen.xlsx
βΒ Β βΒ Β βββ Ranking\ Dzielnic\ 2020\ Warszawa.pdf
βΒ Β βΒ Β βββ ranking_dzielnic_warszawy_pod_wzgledem_atrakcyjnosci_warunkow_zycia_2017.pdf
βΒ Β βΒ Β βββ ranking_otodom.csv
βΒ Β βΒ Β βββ ranking.txt
βΒ Β βΒ Β βββ ranking.xlsx
βΒ Β βββ notebooks
βΒ Β βΒ Β βββ ML\ endgame\ floydhub.ipynb
βΒ Β βΒ Β βββ ML\ endgame.ipynb
βΒ Β βΒ Β βββ NLP\ eda\ etc.ipynb
βΒ Β βΒ Β βββ Pipeline\ mongoRaw\ to\ clean\ before\ EDA.ipynb
βΒ Β βΒ Β βββ real\ EDA.ipynb
βΒ Β βββ pictures
βΒ Β βββ images.png
βΒ Β βββ ml_map.png
βΒ Β βββ simple-house-exterior-white-background_1308-50195.jpg
βΒ Β βββ unnamed.jpg
βΒ Β βββ white-house-background-check-democratic-party-republican-party-house-png.jpg
βββ README.md
6 directories, 32 files
The crawler was deployed on Heroku and in 15min intervals was activated with advanced python scheduler.
- add requirements.txt to ML folder