Skip to content
This repository was archived by the owner on Jul 3, 2023. It is now read-only.
This repository was archived by the owner on Jul 3, 2023. It is now read-only.

Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag. #41

@berkes

Description

@berkes

Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag.

Details

This spider should get a list of instances where it starts (seeds) and follow across instances to fetch toots/updates for a certain hashtag (e.g. #vacancy, #job etc.).

Deliverable

  • It should try to denormalize toots. When instance "example.com" has a toot by '@company@example.org" and "example.org" has this toot too, it should appear only once in the datafile.
  • If an update is manually re-tooted (i.e. text copied as a new update) it may appear multiple times. Denormalizing based on content of an update is not important.
  • Boosts and or replies should be ignored (for now).
  • If tooling is required to setup the environment (pipenv etc) a command should be presented how to get this running for devs and CI.
  • It should be one command, so that integration is easy. Preferably a command that runs and then stops over a deamon.
  • ScraPy is preferered as other parts of this project use that already.

Metadata

Metadata

Assignees

No one assigned

    Labels

    fedifindIssues related to the intermediate "Fedi Find" project.scrapytask

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions