Collect structured Peru real estate listings from Adondevivir search pages and filters in a clean, analysis-ready format. This project automates high-volume property listings scraping so teams can track pricing, locations, and listing freshness without manual copy-paste.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for adondevivir-property-listings-scraper you've just found your team — Let’s Chat. 👆👆
This project extracts comprehensive property listing data from Adondevivir result pages using either direct listing URLs or search filter parameters. It solves the problem of inconsistent, slow, and error-prone manual data collection by delivering standardized fields suitable for analytics, CRMs, and market dashboards. It’s built for real estate agencies, investors, researchers, and PropTech builders who need repeatable, scalable access to Peru property listings.
- Scrapes multiple property categories (houses, apartments, land, commercial) with consistent field output
- Supports both URL-based extraction and filter-based discovery workflows
- Captures pricing structures (multi-currency, multi-operation) for deeper market analysis
- Extracts location hierarchy plus geo-coordinates for mapping and area comparisons
- Includes retry + failure-tolerant options to keep long runs stable
| Feature | Description |
|---|---|
| URL-based scraping mode | Provide one or more search/list URLs and extract listings up to a defined limit per URL. |
| Search-filter scraping mode | Discover listings by keyword + property type + operation type + sorting + page start. |
| Proxy-ready requests | Runs reliably behind rotating IPs to reduce blocks and improve stability at scale. |
| Retry + resilience controls | Configure per-URL retry attempts and optionally continue when specific URLs fail. |
| Rich listing normalization | Outputs structured objects with nested pricing, features, publisher, media, and location data. |
| Geo & map support | Extracts latitude/longitude and a static map reference for spatial analytics. |
| Image metadata extraction | Captures visible pictures array plus counts for media completeness scoring. |
| Listing freshness tracking | Stores modified date and status fields to monitor changes over time. |
| Field Name | Field Description |
|---|---|
| posting_id | Unique identifier for the property listing (useful for deduplication). |
| url | Listing URL or relative path for direct access to the detail page. |
| posting_code | Internal posting reference code shown on the marketplace. |
| title | Listing headline including key attributes and location hints. |
| price_operation_types | Array describing operation type (sale/rent) and one or more prices with currency. |
| expenses | Additional recurring or administrative cost data when available. |
| main_features | Primary features such as area, bedrooms, bathrooms, parking, and age. |
| general_features | Standard amenities/features when available. |
| development_features | Development or project-related features for new builds when available. |
| highlighted_features | Featured amenities or promoted attributes highlighted by the seller. |
| flags_features | Special markers or flags associated with the listing. |
| antiquity | Property age indicator when provided. |
| publisher | Agent/agency/seller metadata (name, profile URL, logo, tags, type). |
| url_logo | Brand/logo reference associated with the listing or publisher when present. |
| real_estate_type | Property category such as house, apartment, office, land, etc. |
| units | Unit data for projects or multi-unit developments when present. |
| publication | Publication timing fields for market trend and recency analysis. |
| premier | Whether the listing is marked as premium/promoted. |
| slot | Placement metadata indicating promoted positions when present. |
| slot_color | UI slot styling indicator when present. |
| house_info | Additional structural/architectural info when present. |
| description_normalized | Cleaned long-form description suitable for NLP and keyword extraction. |
| posting_location | Address and hierarchical location tree (country → province → city → zone → subzone). |
| posting_geolocation | Latitude/longitude and static map URL data for geo analytics. |
| visible_pictures | Array of image objects plus additional image count information. |
| status | Listing availability state (e.g., ONLINE). |
| posting_type | Listing classification (e.g., PROPERTY). |
| WhatsApp contact string when available. | |
| modified_date | Last known modification timestamp for freshness monitoring. |
| from_url | Source results page URL used to discover the listing. |
[
{
"posting_id": "146903999",
"url": "/propiedades/clasificado/alclcain-alquilo-casa-oficina-av-los-ingenieros-485-la-molina-146903999.html",
"posting_code": "CASA OFICINA",
"title": "Alquilo Casa Oficina Av Los Ingenieros 485 - La Molina 200 m² A. C.",
"price_operation_types": [
{
"operation_type": { "name": "Alquiler", "operation_type_id": "2" },
"prices": [
{ "currency_id": "6", "amount": 5500, "formatted_amount": "5,500", "currency": "S/" },
{ "currency_id": "2", "amount": 1500, "formatted_amount": "1,500", "currency": "USD" }
]
}
],
"main_features": {
"c_f_t100": { "label": "Superficie total", "measure": "m²", "value": "260" },
"c_f_t101": { "label": "Superficie techada", "measure": "m²", "value": "200" },
"c_f_t2": { "label": "Dormitorios", "value": "8" },
"c_f_t3": { "label": "Baños", "value": "3" },
"c_f_t7": { "label": "Estacionamiento", "value": "1" }
},
"publisher": {
"publisher_id": "102518169",
"name": "ALEJANDRO MALAGA Agente]",
"url": "/inmobiliarias/alejandro-malaga-agente_102518169-inmuebles.html",
"premier": true
},
"posting_location": {
"address": { "name": "AV LOS INGENIEROS 485, LA MOLINA", "visibility": "EXACT" },
"location": { "name": "Santa Patricia Etapa Iii", "label": "SUBZONA" }
},
"posting_geolocation": {
"geolocation": { "latitude": -12.066212, "longitude": -76.9493649 }
},
"visible_pictures": {
"pictures": [
{ "order": 1, "url730x532": "https://img10.naventcdn.com/avisos/111/01/46/90/39/38/720x532/1543455438.jpg", "title": "Casa de 8 habitaciones, Lima" }
],
"additional_pictures_count": 33
},
"status": "ONLINE",
"posting_type": "PROPERTY",
"whatsapp": "51 956567336",
"modified_date": "2025-07-04T11:28:54-0400",
"from_url": "https://www.adondevivir.com/casas-en-alquiler.html"
}
]
Adondevivir Property Listings Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Adondevivir Property Listings Scraper )/
├── src/
│ ├── main.py
│ ├── runner.py
│ ├── cli.py
│ ├── core/
│ │ ├── browser_manager.py
│ │ ├── request_queue.py
│ │ ├── retry_policy.py
│ │ └── throttler.py
│ ├── extractors/
│ │ ├── listings_page_parser.py
│ │ ├── listing_detail_parser.py
│ │ ├── features_normalizer.py
│ │ ├── pricing_parser.py
│ │ ├── location_parser.py
│ │ └── media_parser.py
│ ├── pipelines/
│ │ ├── url_mode.py
│ │ ├── filters_mode.py
│ │ └── pagination.py
│ ├── models/
│ │ ├── listing.py
│ │ ├── publisher.py
│ │ └── input_schema.py
│ ├── outputs/
│ │ ├── dataset_writer.py
│ │ ├── validators.py
│ │ └── exporters.py
│ └── config/
│ ├── settings.example.json
│ └── logging.yaml
├── data/
│ ├── input.sample.json
│ ├── urls.sample.txt
│ └── sample_output.json
├── tests/
│ ├── test_pricing_parser.py
│ ├── test_location_parser.py
│ └── test_listings_parser.py
├── .env.example
├── .gitignore
├── pyproject.toml
├── requirements.txt
├── LICENSE
└── README.md
- Real estate agencies use it to build a searchable listings database, so they can respond faster to buyer/renter demand with up-to-date inventory.
- Property investors use it to track pricing shifts by district and property type, so they can spot undervalued opportunities earlier.
- Market researchers use it to measure listing freshness and volume trends, so they can publish reliable Peru housing market insights.
- PropTech teams use it to feed recommendation and matching systems, so users can discover better-fit properties with smarter filters.
- Operations teams use it to monitor premium/promoted listings and agent activity, so they can benchmark marketing performance across areas.
How do I choose between URL mode and search-filter mode? If you already have curated result-page URLs (from specific searches, categories, or cities), use URL mode for repeatable runs. If you want discovery-based collection (e.g., “lima” + “departamento” + “venta”), use search-filter mode to generate results dynamically and scrape from a chosen start page.
What’s the best way to avoid blocks and reduce failures? Use rotating residential IPs aligned with the target region (Peru/PE), keep max_items_per_url moderate during long runs, and enable ignore_url_failures so a single bad page does not stop the entire job. Increasing max_retries_per_url helps recover from transient errors.
Does it scrape only the list page or also detailed listing data? It collects detailed listing objects including nested pricing structures, publisher info, location hierarchy, geo-coordinates, and media references. If a field is missing on the site for a specific listing, the scraper returns it as empty/null rather than guessing.
How should I deduplicate data across multiple runs? Use posting_id as the primary key and store modified_date for freshness checks. When the same posting_id appears again, update the record if modified_date is newer, and keep history if you’re tracking market changes.
Primary Metric: Typically processes 20–50 listings per minute per worker on stable connections when scraping mixed result pages with detail enrichment.
Reliability Metric: 96–99% successful page completion rate on multi-URL runs when proxies are enabled and max_retries_per_url is set to 2–3.
Efficiency Metric: Uses a bounded concurrency approach to keep memory steady; long runs commonly stay under 500–800 MB RAM per worker with controlled page reuse.
Quality Metric: 90–98% field completeness on listings that include full metadata, with strongest coverage for pricing, title, location hierarchy, and media counts.
