Skip to content

kami4ka/SubitoScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Subito.it Scraper

A Python scraper for extracting property listings from Subito.it (Italian classifieds portal) using the ScrapingAnt API.

Features

  • Scrapes apartments, villas, land, commercial properties, and more
  • Supports sale and rental listings
  • Covers all Italian regions
  • Parallel scraping for improved performance
  • Extracts 30+ property attributes including price, area, rooms, location, amenities
  • Exports data to CSV format
  • Rate limiting and retry logic for reliability

Installation

  1. Clone the repository:
git clone https://github.com/kami4ka/SubitoScraper.git
cd SubitoScraper
  1. Create a virtual environment and install dependencies:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Usage

Command Line

# Scrape apartments for sale in all Italy
python main.py --category vendita-appartamenti

# Scrape apartments for rent in Lombardia
python main.py --category affitto-appartamenti --region lombardia

# Scrape villas for sale in Toscana with limit
python main.py --category vendita-ville --region toscana --limit 50

# Enable verbose logging
python main.py --category vendita-immobili --max-pages 5 -v

Available Options

Option Description
--category, -c Property category (default: vendita-immobili)
--region, -r Region to filter by (optional)
--output, -o Output CSV file path (default: properties.csv)
--limit Maximum number of properties to scrape
--max-pages Maximum number of listing pages to scrape
--max-workers, -w Maximum parallel requests (default: 10)
--api-key, -k ScrapingAnt API key (overrides config)
--verbose, -v Enable verbose logging

Available Categories

Category Description
vendita-immobili All properties for sale
vendita-appartamenti Apartments for sale
vendita-ville Villas for sale
vendita-terreni Land for sale
vendita-garage Garages for sale
vendita-loft Lofts/mansards for sale
vendita-uffici Offices/commercial for sale
affitto-immobili All properties for rent
affitto-appartamenti Apartments for rent
affitto-ville Villas for rent
affitto-camere Rooms for rent
affitto-garage Garages for rent
affitto-loft Lofts/mansards for rent
affitto-uffici Offices/commercial for rent
affitto-vacanze Vacation rentals

Supported Regions

All 20 Italian regions are supported: Lombardia, Lazio, Campania, Sicilia, Veneto, Emilia-Romagna, Piemonte, Puglia, Toscana, Calabria, Sardegna, Liguria, Marche, Abruzzo, Friuli-Venezia-Giulia, Trentino-Alto-Adige, Umbria, Basilicata, Molise, Valle-d-Aosta.

Use --region italia (default) to search nationwide.

Output Format

The scraper exports data to CSV with the following fields:

Field Description
url Property listing URL
listing_id Unique listing identifier
title Property title
property_type Type (Appartamento, Villa, etc.)
contract_type Sale or rent
price Listed price in EUR
city City name
province Province code (e.g., MI, RM)
living_area Living area in m²
rooms Number of rooms (locali)
bedrooms Number of bedrooms
bathrooms Number of bathrooms
floor Floor level
condition Property condition
elevator Elevator availability
balcony Balcony availability
terrace Terrace availability
garden Garden availability
air_conditioning AC availability
parking Parking details
furnished Furnished status
available_immediately Immediate availability
energy_class Energy certificate class
heating Heating type
description Property description
seller_name Seller/agent name
date_posted Listing date

API Configuration

This scraper uses the ScrapingAnt API for web scraping. You can provide the API key via:

  1. Environment variable: export SCRAPINGANT_API_KEY=your_key
  2. Command line: --api-key YOUR_KEY

Configuration options in config.py:

  • SCRAPINGANT_API_KEY: Your API key
  • DEFAULT_MAX_WORKERS: Parallel request limit (default: 10)
  • DEFAULT_TIMEOUT: Request timeout in seconds (default: 120)
  • MAX_RETRIES: Number of retry attempts (default: 3)

License

MIT License

About

Python scraper for Subito.it property listings using ScrapingAnt API

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages