Memorious

The solitary and lucid spectator of a multiform, instantaneous and almost intolerably precise world.

-- Funes the Memorious, Jorge Luis Borges

memorious is a light-weight web scraping toolkit. It supports scrapers that collect structured or un-structured data. This includes the following use cases:

Make crawlers modular and simple tasks reusable
Provide utility functions to do common tasks such as data storage, HTTP session management
Integrate crawlers with the Aleph and FollowTheMoney ecosystem
Get out of your way as much as possible

memorious is part of the OpenAleph suite but can be used standalone as well.

Design

When writing a scraper, you often need to paginate through through an index page, then download an HTML page for each result and finally parse that page and insert or update a record in a database.

memorious handles this by managing a set of crawlers, each of which can be composed of multiple stages. Each stage is implemented using a Python function, which can be reused across different crawlers.

The basic steps of writing a Memorious crawler:

Make YAML crawler configuration file
Add different stages
Write code for stage operations (optional)
Test, rinse, repeat

Documentation

The documentation for Memorious is available at docs.investigraph.dev/lib/memorious. Feel free to edit the source files in the docs folder and send pull requests for improvements.

To serve the documentation locally, run mkdocs serve

License and Copyright

memorious, (C) -2024 Organized Crime and Corruption Reporting Project

memorious4 is licensed under the AGPLv3 or later license.

Prior to version 4.0.0, memorious was released under the MIT license.

see NOTICE and LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 1,005 Commits
.github		.github
docs		docs
memorious		memorious
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
REFACTOR.md		REFACTOR.md
VERSION		VERSION
env.sh.tmpl		env.sh.tmpl
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Memorious

Design

Documentation

License and Copyright

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

dataresearchcenter/memorious

Folders and files

Latest commit

History

Repository files navigation

Memorious

Design

Documentation

License and Copyright

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages