Skip to content

hyperlordnovaai/adondevivir-property-listings-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Adondevivir Property Listings Scraper

Collect structured Peru real estate listings from Adondevivir search pages and filters in a clean, analysis-ready format. This project automates high-volume property listings scraping so teams can track pricing, locations, and listing freshness without manual copy-paste.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for adondevivir-property-listings-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts comprehensive property listing data from Adondevivir result pages using either direct listing URLs or search filter parameters. It solves the problem of inconsistent, slow, and error-prone manual data collection by delivering standardized fields suitable for analytics, CRMs, and market dashboards. It’s built for real estate agencies, investors, researchers, and PropTech builders who need repeatable, scalable access to Peru property listings.

Built for Peru Real Estate Market Intelligence

  • Scrapes multiple property categories (houses, apartments, land, commercial) with consistent field output
  • Supports both URL-based extraction and filter-based discovery workflows
  • Captures pricing structures (multi-currency, multi-operation) for deeper market analysis
  • Extracts location hierarchy plus geo-coordinates for mapping and area comparisons
  • Includes retry + failure-tolerant options to keep long runs stable

Features

Feature Description
URL-based scraping mode Provide one or more search/list URLs and extract listings up to a defined limit per URL.
Search-filter scraping mode Discover listings by keyword + property type + operation type + sorting + page start.
Proxy-ready requests Runs reliably behind rotating IPs to reduce blocks and improve stability at scale.
Retry + resilience controls Configure per-URL retry attempts and optionally continue when specific URLs fail.
Rich listing normalization Outputs structured objects with nested pricing, features, publisher, media, and location data.
Geo & map support Extracts latitude/longitude and a static map reference for spatial analytics.
Image metadata extraction Captures visible pictures array plus counts for media completeness scoring.
Listing freshness tracking Stores modified date and status fields to monitor changes over time.

What Data This Scraper Extracts

Field Name Field Description
posting_id Unique identifier for the property listing (useful for deduplication).
url Listing URL or relative path for direct access to the detail page.
posting_code Internal posting reference code shown on the marketplace.
title Listing headline including key attributes and location hints.
price_operation_types Array describing operation type (sale/rent) and one or more prices with currency.
expenses Additional recurring or administrative cost data when available.
main_features Primary features such as area, bedrooms, bathrooms, parking, and age.
general_features Standard amenities/features when available.
development_features Development or project-related features for new builds when available.
highlighted_features Featured amenities or promoted attributes highlighted by the seller.
flags_features Special markers or flags associated with the listing.
antiquity Property age indicator when provided.
publisher Agent/agency/seller metadata (name, profile URL, logo, tags, type).
url_logo Brand/logo reference associated with the listing or publisher when present.
real_estate_type Property category such as house, apartment, office, land, etc.
units Unit data for projects or multi-unit developments when present.
publication Publication timing fields for market trend and recency analysis.
premier Whether the listing is marked as premium/promoted.
slot Placement metadata indicating promoted positions when present.
slot_color UI slot styling indicator when present.
house_info Additional structural/architectural info when present.
description_normalized Cleaned long-form description suitable for NLP and keyword extraction.
posting_location Address and hierarchical location tree (country → province → city → zone → subzone).
posting_geolocation Latitude/longitude and static map URL data for geo analytics.
visible_pictures Array of image objects plus additional image count information.
status Listing availability state (e.g., ONLINE).
posting_type Listing classification (e.g., PROPERTY).
whatsapp WhatsApp contact string when available.
modified_date Last known modification timestamp for freshness monitoring.
from_url Source results page URL used to discover the listing.

Example Output

[
      {
            "posting_id": "146903999",
            "url": "/propiedades/clasificado/alclcain-alquilo-casa-oficina-av-los-ingenieros-485-la-molina-146903999.html",
            "posting_code": "CASA OFICINA",
            "title": "Alquilo Casa Oficina Av Los Ingenieros 485 - La Molina 200 m² A. C.",
            "price_operation_types": [
                  {
                        "operation_type": { "name": "Alquiler", "operation_type_id": "2" },
                        "prices": [
                              { "currency_id": "6", "amount": 5500, "formatted_amount": "5,500", "currency": "S/" },
                              { "currency_id": "2", "amount": 1500, "formatted_amount": "1,500", "currency": "USD" }
                        ]
                  }
            ],
            "main_features": {
                  "c_f_t100": { "label": "Superficie total", "measure": "m²", "value": "260" },
                  "c_f_t101": { "label": "Superficie techada", "measure": "m²", "value": "200" },
                  "c_f_t2": { "label": "Dormitorios", "value": "8" },
                  "c_f_t3": { "label": "Baños", "value": "3" },
                  "c_f_t7": { "label": "Estacionamiento", "value": "1" }
            },
            "publisher": {
                  "publisher_id": "102518169",
                  "name": "ALEJANDRO MALAGA Agente]",
                  "url": "/inmobiliarias/alejandro-malaga-agente_102518169-inmuebles.html",
                  "premier": true
            },
            "posting_location": {
                  "address": { "name": "AV LOS INGENIEROS 485, LA MOLINA", "visibility": "EXACT" },
                  "location": { "name": "Santa Patricia Etapa Iii", "label": "SUBZONA" }
            },
            "posting_geolocation": {
                  "geolocation": { "latitude": -12.066212, "longitude": -76.9493649 }
            },
            "visible_pictures": {
                  "pictures": [
                        { "order": 1, "url730x532": "https://img10.naventcdn.com/avisos/111/01/46/90/39/38/720x532/1543455438.jpg", "title": "Casa de 8 habitaciones, Lima" }
                  ],
                  "additional_pictures_count": 33
            },
            "status": "ONLINE",
            "posting_type": "PROPERTY",
            "whatsapp": "51 956567336",
            "modified_date": "2025-07-04T11:28:54-0400",
            "from_url": "https://www.adondevivir.com/casas-en-alquiler.html"
      }
]

Directory Structure Tree

Adondevivir Property Listings Scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Adondevivir Property Listings Scraper )/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── cli.py
│   ├── core/
│   │   ├── browser_manager.py
│   │   ├── request_queue.py
│   │   ├── retry_policy.py
│   │   └── throttler.py
│   ├── extractors/
│   │   ├── listings_page_parser.py
│   │   ├── listing_detail_parser.py
│   │   ├── features_normalizer.py
│   │   ├── pricing_parser.py
│   │   ├── location_parser.py
│   │   └── media_parser.py
│   ├── pipelines/
│   │   ├── url_mode.py
│   │   ├── filters_mode.py
│   │   └── pagination.py
│   ├── models/
│   │   ├── listing.py
│   │   ├── publisher.py
│   │   └── input_schema.py
│   ├── outputs/
│   │   ├── dataset_writer.py
│   │   ├── validators.py
│   │   └── exporters.py
│   └── config/
│       ├── settings.example.json
│       └── logging.yaml
├── data/
│   ├── input.sample.json
│   ├── urls.sample.txt
│   └── sample_output.json
├── tests/
│   ├── test_pricing_parser.py
│   ├── test_location_parser.py
│   └── test_listings_parser.py
├── .env.example
├── .gitignore
├── pyproject.toml
├── requirements.txt
├── LICENSE
└── README.md

Use Cases

  • Real estate agencies use it to build a searchable listings database, so they can respond faster to buyer/renter demand with up-to-date inventory.
  • Property investors use it to track pricing shifts by district and property type, so they can spot undervalued opportunities earlier.
  • Market researchers use it to measure listing freshness and volume trends, so they can publish reliable Peru housing market insights.
  • PropTech teams use it to feed recommendation and matching systems, so users can discover better-fit properties with smarter filters.
  • Operations teams use it to monitor premium/promoted listings and agent activity, so they can benchmark marketing performance across areas.

FAQs

How do I choose between URL mode and search-filter mode? If you already have curated result-page URLs (from specific searches, categories, or cities), use URL mode for repeatable runs. If you want discovery-based collection (e.g., “lima” + “departamento” + “venta”), use search-filter mode to generate results dynamically and scrape from a chosen start page.

What’s the best way to avoid blocks and reduce failures? Use rotating residential IPs aligned with the target region (Peru/PE), keep max_items_per_url moderate during long runs, and enable ignore_url_failures so a single bad page does not stop the entire job. Increasing max_retries_per_url helps recover from transient errors.

Does it scrape only the list page or also detailed listing data? It collects detailed listing objects including nested pricing structures, publisher info, location hierarchy, geo-coordinates, and media references. If a field is missing on the site for a specific listing, the scraper returns it as empty/null rather than guessing.

How should I deduplicate data across multiple runs? Use posting_id as the primary key and store modified_date for freshness checks. When the same posting_id appears again, update the record if modified_date is newer, and keep history if you’re tracking market changes.


Performance Benchmarks and Results

Primary Metric: Typically processes 20–50 listings per minute per worker on stable connections when scraping mixed result pages with detail enrichment.

Reliability Metric: 96–99% successful page completion rate on multi-URL runs when proxies are enabled and max_retries_per_url is set to 2–3.

Efficiency Metric: Uses a bounded concurrency approach to keep memory steady; long runs commonly stay under 500–800 MB RAM per worker with controlled page reuse.

Quality Metric: 90–98% field completeness on listings that include full metadata, with strongest coverage for pricing, title, location hierarchy, and media counts.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

No packages published