Skip to content

Santhin/real-estate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


🧐 About

Project was created for "SKNS Warsztaty z Pythona".
Consists crawler for scraping real estate data from gumtree and jupyter notebook with ML.

🏁 Getting Started

To clone repository type:

git clone https://github.com/Santhin/real-estate

To run crawler locally:

pip install -r requirements
python app.py

Project structure

.
β”œβ”€β”€ crawler
β”‚Β Β  β”œβ”€β”€ app.py
β”‚Β Β  β”œβ”€β”€ aps_asyncio.py
β”‚Β Β  β”œβ”€β”€ gumtree
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ items.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ middlewares.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ pipelines.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ settings.py
β”‚Β Β  β”‚Β Β  └── spiders
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ gumtree_crawler.py
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β      └── stack.py
β”‚Β Β  β”œβ”€β”€ install_asyncio.py
β”‚Β Β  β”œβ”€β”€ Procfile
β”‚Β Β  β”œβ”€β”€ requirements.txt
β”‚Β Β  └── scrapy.cfg
β”œβ”€β”€ LICENSE
β”œβ”€β”€ ml
β”‚Β Β  β”œβ”€β”€ features
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ rankingcen.xlsx
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ Ranking\ Dzielnic\ 2020\ Warszawa.pdf
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ranking_dzielnic_warszawy_pod_wzgledem_atrakcyjnosci_warunkow_zycia_2017.pdf
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ranking_otodom.csv
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ranking.txt
β”‚Β Β  β”‚Β Β  └── ranking.xlsx
β”‚Β Β  β”œβ”€β”€ notebooks
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ML\ endgame\ floydhub.ipynb
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ ML\ endgame.ipynb
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ NLP\ eda\ etc.ipynb
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ Pipeline\ mongoRaw\ to\ clean\ before\ EDA.ipynb
β”‚Β Β  β”‚Β Β  └── real\ EDA.ipynb
β”‚Β Β  └── pictures
β”‚Β Β      β”œβ”€β”€ images.png
β”‚Β Β      β”œβ”€β”€ ml_map.png
β”‚Β Β      β”œβ”€β”€ simple-house-exterior-white-background_1308-50195.jpg
β”‚Β Β      β”œβ”€β”€ unnamed.jpg
β”‚Β Β      └── white-house-background-check-democratic-party-republican-party-house-png.jpg
└── README.md

6 directories, 32 files

πŸš€ Deployment

The crawler was deployed on Heroku and in 15min intervals was activated with advanced python scheduler.

⛏️ Built Using

πŸ› οΈ Todo

  • add requirements.txt to ML folder

About

Real estate crawler with ML on scraped data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published