Skip to content

Feature/ingestion/deduplicate listings#43

Open
immangat wants to merge 42 commits intomainfrom
feature/ingestion/deduplicate_listings
Open

Feature/ingestion/deduplicate listings#43
immangat wants to merge 42 commits intomainfrom
feature/ingestion/deduplicate_listings

Conversation

@immangat
Copy link
Contributor

Description

Issue Link: Please add the link to the issue here

This pull request is a part of the data cleaning process within the ingestion module. It implements a check to ensure that duplicate listings are not saved to the datastore.

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Currently not tested, tests will be added in a future pull request

Checklist:

Before you submit your pull request, please make sure you have completed the following:

  • I have read the CONTRIBUTING document.
  • I have checked that my code adheres to the code style of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • Any dependent changes have been merged and published in downstream modules.

Screenshots (if applicable)

Include any relevant screenshots or screen recordings demonstrating your changes.

Additional Notes

  • The pipeline will be modified in a future PR to make it more general so it can work with call types of datastores, not only csv.

…ures/listing_spider

# Conflicts:
#	ingestion/README.md
#	ingestion/airbnb_scrapping/airbnb_listings/items.py
#	ingestion/airbnb_scrapping/airbnb_listings/settings.py
#	ingestion/airbnb_scrapping/scrapy.cfg
#	ingestion/requirements.txt
@immangat immangat added the enhancement New feature or request label Feb 22, 2024
@immangat immangat added this to the 1. MVP milestone Feb 22, 2024
@immangat immangat self-assigned this Feb 22, 2024
@immangat immangat requested a review from umsu2 as a code owner February 22, 2024 01:51
@immangat immangat linked an issue Feb 22, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[MVP] Dealing with duplicate Listings

1 participant