The solitary and lucid spectator of a multiform, instantaneous and almost intolerably precise world.
-- Funes the Memorious, Jorge Luis Borges
memorious is a light-weight web scraping toolkit. It supports scrapers that
collect structured or un-structured data. This includes the following use cases:
- Make crawlers modular and simple tasks reusable
- Provide utility functions to do common tasks such as data storage, HTTP session management
- Integrate crawlers with the Aleph and FollowTheMoney ecosystem
- Get out of your way as much as possible
memorious is part of the OpenAleph suite but can be used standalone as well.
When writing a scraper, you often need to paginate through through an index page, then download an HTML page for each result and finally parse that page and insert or update a record in a database.
memorious handles this by managing a set of crawlers, each of which
can be composed of multiple stages. Each stage is implemented using a
Python function, which can be reused across different crawlers.
The basic steps of writing a Memorious crawler:
- Make YAML crawler configuration file
- Add different stages
- Write code for stage operations (optional)
- Test, rinse, repeat
The documentation for Memorious is available at
docs.investigraph.dev/lib/memorious.
Feel free to edit the source files in the docs folder and send pull requests for improvements.
To serve the documentation locally, run mkdocs serve
memorious, (C) -2024 Organized Crime and Corruption Reporting Project
memorious, (C) 2025 Data and Research Center – DARC
memorious4, (C) 2026 Data and Research Center – DARC
memorious4 is licensed under the AGPLv3 or later license.
Prior to version 4.0.0, memorious was released under the MIT license.