Skip to content
This repository was archived by the owner on Jul 3, 2023. It is now read-only.
This repository was archived by the owner on Jul 3, 2023. It is now read-only.

Intermediate Search: Implement a crawler that spans mastodon instances. #38

@berkes

Description

@berkes

Expand the crawler that can crawl a list of mastodon instances and extracts public profiles with #for-hire tags.

Details

For the intermediate product "for hire search", we need to extend the ScraPy spider to crawl across multiple mastodon instances. Currently it only crawls one instance in a Proof of concept. TODO: release this scraper proof of concept in a flockingbird repo.

"Intermediate search" is explained in #37.

Deliverable

  • Given a list (in JSON, text or STDIN), we crawl each instance on that list for public profiles
  • As in the Proof of concept scraper, we only index public data.
  • As in the Proof of concept scraper, we adhere to noindex, robots.txt, etc.
  • It returns a JSON document either per instance, or of all instances, structural similar to the proof of concept scraper.

Metadata

Metadata

Assignees

No one assigned

    Labels

    fedifindIssues related to the intermediate "Fedi Find" project.scrapytask

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions