Skip to content
This repository was archived by the owner on Jul 3, 2023. It is now read-only.
This repository was archived by the owner on Jul 3, 2023. It is now read-only.

Write a script or scrapy spider that retrieves a list of (mastodon) instances. #40

@berkes

Description

@berkes

Write a script or scrapy spider that retrieves a list of (mastodon) instances.

Details

For #37, as input for #38 we need an updated list of mastodon instances. Later extendable with pleroma, friendica and other fediverse instances. The source should allow fetching this data (i.e. don't just copy the first fediverse-list as it may not allow copying this list).

This list acts as input (a textfile, json or STDIN) to the spider in #38 that finds "for-hire" profies on that instance.

Deliverable

  • A command that fetches instances and presents those in plain text. Preferably to STDOUT so the integration can choose to pipe it elsewhere or redirect into a file.
  • If this command requires tooling and environment setup, we need additional commands to set this environment up (in CI en on a server). But simpler (i.e. bash+curl, or a single binary) is preferred over pipenv. rbenv, nodejs/npm/npx and so on.
  • It should run in reasonable time: i.e. not take days and gigabytes of bandwith to fetch a list, but rather minutes or seconds.
  • It should not hammer servers: if it needs to crawl (i.e. ScraPy), it must set conservative delays.
  • It should advertise itself transparently to the server in a HTTP header. So that admins of services can contact us instead of just blocking us.

Metadata

Metadata

Assignees

No one assigned

    Labels

    fedifindIssues related to the intermediate "Fedi Find" project.scrapytask

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions