You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 3, 2023. It is now read-only.
Write a ScraPy spider that fetches a tag (#vacancy, others?) and extracts toots/updates found for that tag.
Details
This spider should get a list of instances where it starts (seeds) and follow across instances to fetch toots/updates for a certain hashtag (e.g. #vacancy, #job etc.).
Deliverable
It should try to denormalize toots. When instance "example.com" has a toot by '@company@example.org" and "example.org" has this toot too, it should appear only once in the datafile.
If an update is manually re-tooted (i.e. text copied as a new update) it may appear multiple times. Denormalizing based on content of an update is not important.
Boosts and or replies should be ignored (for now).
If tooling is required to setup the environment (pipenv etc) a command should be presented how to get this running for devs and CI.
It should be one command, so that integration is easy. Preferably a command that runs and then stops over a deamon.
ScraPy is preferered as other parts of this project use that already.