Skip to content

WIP: Archive node  #5188

@bedeho

Description

@bedeho

unlock reducing replication factor and increase overall safety of storage of data objects.

Remark

The purpose of the archival node is allow cutting costs by substituting more centralisation for redundancy in a way which is deemed safe. Namely, to stop using storage nodes as primary and only long term glacial storage solution by having a high replication factor. This is overkill as true archival storage does not need low latency availability. Instead, storage nodes are here meant to serve as origin serves for distributors, which requires far less redundancy, perhaps only 2x is fine so long as archival nodes are operating.

Now, the name archival node, is perhaps a misnomer. This does not need to be a standalone node which has a public API that other nodes can connect to for any kind of service or interaction. The bare minimum here is just a script, ideally stateless, that simply stores every single data object every uploaded successfully. It does not even need to respond to deletion events, that is overkill. It can use its local file system as state for what data objects it has fully downloaded, e.g. using data object id, and then using on-chain size indicator to know if tis complete or not. If downloads fails, it just will abandon and retry at a later interval. Any data object that actually needs to be recovered can manually be downloaded from the host using scp or some other really simple mechanism.

I would at least advocate for such a minimal effort system to unlock the benefits we are looking for with minimal risk and time. Later it can be extended if we run into trouble or costs of any kind.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions