Smart contracts content directory - implementation plan

This issue describes in some detail how the initial implementation of smart-contract content directory (described in issues like https://github.com/Joystream/joystream/issues/1520) could potentially look like and how it would affect our TypeScript codebase (query node, CLI, network tests). The draft PR containing the initial contracts code can be found here: https://github.com/Joystream/joystream/pull/1752 (though this plan introduces a few changes, like the format of `metadata` which I actually reconsiderend while working on this issue).

Since it's not entirely clear to me yet what is feasible to implemenent on the runtime/node side, I tried to explore a few different possibilities, based on my experience with some reference implementations like [this](https://github.com/PureStake/moonbeam/tree/moonbeam-tutorials) and some documentation that I was able to find (more links are provided under `Runtime` section below).

## Runtime

**Some related resources:**
- https://substrate.dev/docs/en/knowledgebase/smart-contracts/evm-pallet
- https://github.com/paritytech/frontier
- https://github.com/PureStake/moonbeam/tree/moonbeam-tutorials - this is the chain I was using locally so far as reference implementation.
- https://www.youtube.com/watch?v=b9R_-JPtKlM - this video that demonstrates how some basic interactions with the evm module look like (a link to a repository containing the example CLI code can be found in the description).
- https://www.youtube.com/watch?v=lXw6GTNh73Y - a more in-depth dive into evm pallet (though it's probably quite outdated now)

### We would need:

- **EVM pallet / module enabled, exposing extrinsics like `evm.call`, `evm.deposit`, `evm.withdraw` and `evm.create`.**

    For the purpose of extrinsics like `evm.call` we want given sender substrate address to be deterministically converted to evm-compatible address that would be accessible through `msg.sender` inside the Solidity contracts.

    In the implementation that I used for the initial tests this was the standard behavior and the conversion function looked like this (this implementation can also be inspected [here](https://github.com/paritytech/substrate/blob/c72651c309c7b9bf328cd0aa55a932b5e1d373e2/frame/evm/src/lib.rs#L81)):

    ```
    impl<H: Hasher, A: AsRef<[u8]>> ConvertAccountId<A> for HashTruncateConvertAccountId<H> {
        fn convert_account_id(account_id: &A) -> H160 {
            let account_id = H::hash(account_id.as_ref());
            let account_id_len = account_id.as_ref().len();
            let mut value = [0u8; 20];
            let value_len = value.len();

            if value_len > account_id_len {
                value[(value_len - account_id_len)..].copy_from_slice(account_id.as_ref());
            } else {
                value.copy_from_slice(&account_id.as_ref()[(account_id_len - value_len)..]);
            }

            H160::from(value)
        }
    }
    ```

    On the frontend the same conversion could be done with:

    ```
    crypto.blake2AsHex(crypto.decodeAddress(substrateKeyPair.address), 256).substring(26)
    ```

- **Web3 rpc support** (optionally) - this seems to be `rpc-ethereum` component of https://github.com/paritytech/frontier.

    It isn't strictly required and I'm not sure how feasible it is for our node to support it, but on the frontend-side of things this would be very helpful, because otherwise accessing contract storage variables is going to be very hard to implement (I think it would require using a very low-level `evm.accountStorages` query which returns a raw storage slot data as hex).

    Having access to this rpc would also:
    - make it easier to depoly/initialize contracts with tools like truffle without having to build custom deployment logic in order to make it compatible with substrate extrinsics (like `evm.create`)
    - allow the usage of standard ethereum keypairs for `evm` calls, which may add some interesting possibilities for smart contracts
    - we would have full support of `Web3.js` and other existing js libraries, which can generally make some interactions with smart contracts way simpler (ie. subscribing to events)
    - we could run Truffle tests compatible with Ganache node against our substrate node (though we would probably want to use the substrate extrinsics for final tests, since they'd rely on memberships etc.)

    It's not entirely clear to me how the eth transactions are handled vs substrate transactions/extrinsics in this setup by the runtime, but it seems pretty obvious that the sender of eth transaction will still need to pay a fee (which would be in "ethereum").

    Assuming the users have separate balances in `ETH` and `JOY` (which seems to be the default case - they can then convert between them using `evm.deposit` and `evm.withdraw` extrinsics), that would mean that if we want to allow them to use Web3 RPC (ie. through `Web3.js`) to deploy/call contracts using ethereum keypairs, we probably need to also allow `evm.deposit` to **any** `H160` (ethereum) account.
    
    By default the evm pallet (or at least the implementation I was using) seems to only allow deposits from substrate account to an eth account that is associated with the substrate addrres (a result of conversion), so there seems to be no way for the user to transfer value to an eth address that the user actually has the private key to.


- **Runtime storage for 2 contract addresses that can be changed through `sudo`** (`MembershipBridge` and `ContentWorkingGroupBridge`) - the runtime should be able to call into those contracts when executing some of the extrinsics in membership and working-group moudle (as expalined in the next point). Ultimately we will probably want them to be set via proposal(s), but initially they can be set through `sudo`, ie. we can have extrinsics like `setMembershipBridgeContractAddress` and `setContentWorkingGroupBridgeContractAddress`

- **"Hooks" in the `membership` and `working-groups` module**

  We'd want to runtime to execute evm calls on some specific events. For this it should use a hardcoded `msg.sender` address, referred to as the *runtime address* (ie. `0x2222222222222222222222222222222222222222`).

  We'd need following hooks:
    - on `memberships::set_controller_account` - call `MembershipBridge.setMemberAddress(memberId, address)` (the address must always be converted to ethereum address, the same way it is converted for the purpose of `evm.call` extrinsic)
    - on `memberships::buy_membership` - call `MembershipBridge.setMemberAddress(memberId, address)` (same as above)
    - on `memberships::add_screened_member` - same as above
    - on `ContentDirectoryWorkingGroup::fulfill_successful_applications` (or `fill_opening`) - call `ContentWorkingGroupBridge.setCuratorAddress(curatorId, address)` and potentially `ContentWorkingGroupBridge.setLeadAddress(address)` (if the lead was hired)
    - on `ContentDirectoryWorkingGroup::deactivate_worker` - call `ContentWorkingGroupBridge.setCuratorAddress(curatorId, address)` and potentially `ContentWorkingGroupBridge.setLeadAddress(address)` (in this case we use an empty address: `0x0000000000000000000000000000000000000000`)
    - on `ContentDirectoryWorkingGroup::update_role_account` - call `ContentWorkingGroupBridge.setCuratorAddress(curatorId, address)` and potentially `ContentWorkingGroupBridge.setLeadAddress(address)`

  The runtime can either have the signature of those methods hardcoded (ie. `SET_MEMBER_ADDRESS_SIGNATURE = keccak('setMemberAddress(uint64,address)')`) or those can also be set via sudo (the same way contract addresses will be).


## Deploying the contracts

We can do this either using `evm.create` extrinsic or the web3 rpc (if it's supported), as described in the first section.

**Bridge contracts**

When deploying the bridge contracts (`MembershipBridge`, `ContentWorkingGroupBridge`), assuming we want to preserve existing memberships and contentWorkingGroup state, we need to take into account the initialization process (which was also described here: https://github.com/Joystream/joystream/issues/1677):

1. Make runtime aware of bride contract addresses (set them via `sudo`)
2. Export members / content-working-group state at the same block `1.` was executed
3. Initialize the data in the bridge contracts (using standard initalization process with `batchInsert`/`batchSet`)

The key here is that the bridge contracts need to handle beeing initialized and potentially updated by the runtime at the same, because initialization may take multiple blocks, during which extrinsics like `membership.buy_membership` may also be executed.

The solution to this which I explored in https://github.com/Joystream/joystream/issues/1677 seems to be ignoring any member/curator/lead address changes that are part of `batchSet` (or some other method used by the initialization script) if they reffer to member/curator/lead whose address was already set by the runtime (ie. via `setCuratorAddress` contract method)

**Storage contracts and logic contract**

Those don't seem to require any special initialization logic, sice the content working group lead should be able to initialize content directory to any state using just the standard contract methods (ie. `addCuratorGroup`, `createChannel`, `addVideo`, `updateChannelOwnership` etc.).

It still may be worth to include a way to inititialize a storage contract (ie. `VideoStorage`) to given state and make it possible to migrate just a single storage contract (the way it was described in the initial draft: https://github.com/Joystream/joystream/issues/1520#issuecomment-724227857), to simplify introducing single storage contract layout changes.

## Query node (and general event/metadata handling)

Here my proposition was to rely on the evm module `Log` event.

The data that this event returns is a raw hex that can be very easily decoded with existing tools like https://github.com/ConsenSys/abi-decoder or perhaps https://www.trufflesuite.com/docs/truffle/codec/modules/_truffle_codec.html

Using the contract abi (we can get it from `json` file created by `truffle compile`) we can convert the raw `topics` and `data` hex (part of the `Log` event) into a friedly event representation like:

```
{
  name: 'ChannelCreated',
  args: {
    _id: BN
    _ownership: { ownershipType: BN; ownerId: BN }
    _metadata: string
  }
}
```

_There is a library called [`typechain`](https://github.com/ethereum-ts/TypeChain) which is capable of, among many other things, generating `TypeScript` interfaces for events like this based on the actual solidity contracts (the code above is pretty much just copied from one of the `d.ts` files it generated)_

The query node could handle the `Log` event similarly to how it now handles events like `TransactionCompleted`, ie. execute the approperiate handlers like `createChannel`, `updateChannelProperties` etc. depending on the decoded event type. This should actually be much easier than the current approach, as there wouldn't be a need to deal with very abstract concepts like `CreateEntity`/`AddSchemaSupportToEntity` operations, `ParametrizedInputPropertyValue` types etc. The `_metadata` part of the event (that holds all values previously provided in form of `InputProperyValues`) will be just a json-encoded object that can be easily parsed and validated by the query node (as further described below)

### Video/Channel metadata

Consider current Video schema (part of https://github.com/Joystream/joystream/issues/824#issuecomment-653150085):

```
type Video {
  id: ID!
  entityID: BigInteger!
  channel: Channel!
  category: Category!
  title: String!
  description: String!
  duration: Int!
  skippableIntroDuration: Int
  thumbnailURL: String!
  Language: Language
  media: VideoMedia!
  hasMarketing: Boolean
  publishedOnJoystreamAtblockHeight: BigInteger!
  publishedOnJoystreamAtTimeStamp: BigInteger!
  publishedBeforeJoystream: DateTime
  isPublic: Bool!
  isCurated: Boolean!
  isExplicit: Boolean!
  license: License!
}
```

It consists of properties of different types like `String`, `Int`, `Boolean`, some one-to-one relationships with other entities like `VideoMedia` and `many-to-one` relationships with entities like `Channel`, `Category`, `Language` etc. Some of those (like `VideoMedia`) also have their own "nested" entities (ie. `MediaLocation`).

Currently all of this is reflected by the runtime schemas (as closely as possible), but once we move to smart contracts **the entire `Video` metadata can represented just as a json string** (which the contracts actually don't care much about, they'll just emit it in an event like `VideoCreated` or `VideoMetadataUpdated`).

We can use the json schema standard (https://json-schema.org/) to describe the expected metadata json object.
We already have json schemas kind-of like that auto-generated for entities by `@joysteam/cd-schemas` library (ie.: `/content-directory-schemas/schemas/entities`), they'd of course need to be slightly adjusted and their role would be different. Instead of beeing auto-generated from runtime schemas, they will be the one and only source of truth about how we expect the `Channel` / `Video` metadata to look like.

With json schemas it would be no problem at all to hadle nested objects, enum types, "either-or" values (`oneOf`) etc. since json schemas have support for all those cases.

There is also a great library called `json-schema-to-typescript` that allow us to generate fully-compatible TypeSciprt interfaces from those json schemas (already used by `@joystram/cd-schemas`) and a json-schema validation library called `Ajv`, that we can use to validate json object against a json schema.

With all that, we can have a very customisable, fully typesafe `Channel`/`Video` metadata, that should be easy to handle on the mappings side (as described below).

#### Handling metadata on channel/video creation

I imagine the flow to be like this:
1. Query node recieves metadata in form of json string as part of, for example, `ChannelCreated` event
2. Query node converts the json string to json object (if this fails - go to `invalid metadata` case)
3. Query node validates the json object against `ChannelMetadata.schema.json` using `Ajv`
4. If the schema is valid - we can TypeScript-assert that the metadata object is of given interface type,  ie. `const metadata = metadataJsonObj as ChannelMetadata`. Since the `ChannelMetadata` interface is auto-generated from the json schema this approach should be 100% typesafe and further handling should be very easy.
5. If the metadata is invalid (either not a valid json at all or the validation against json schema fails) - the query node could potentially store some error log, so that the user who tried to add the content can query it and see why it was rejected. Smart contracts do not differentiate between a valid-metadata and invalid-metadata `Video`/`Channel`, so it's still possible that smart contract methods like `updateChannelMetadata`, `addVideo` or `updateChannelOwnership` will be executed in context of an invalid-metadta channel. This means query node should probably store some representation of invalid-metadata channel regardless (it will then just have an `id` and `owner`) and allow it to become "valid" once `updateChannelMetadata` is executed (in context of this channel) containing a full (not partial!), valid metadata object.

#### Handling channel/video metadata updates

The updates are a bit more tricky to handle, since in this case the metadata json object will not be "complete".
This means we would need to "merge" the update with existing metadata before doing validation.
It's not always obvious how to perform this merge, let's take a look at an example:

Imagine a `Video` update object like this:

```
{
  media: {
    location: {
      httpMediaLocation: {
        url: "http://example.com/video.mp4"
      }
    }
  }
}
```

And let's assume the current Video metadata object looks like this

```
{
  title: "Some title",
  description: "Some video description"
  media: {
    width: 1080,
    height: 768,
    ...
    location: {
      joystreamMediaLocation: {
        contentId: "5C61WpAUiceHS8yTWxkncHYhxbAuoqspesiZJooQWSompu98"
      }
    }
  }
}
```

The json schema can expect the location to have either `joystreamMediaLocation` or `httpMediaLocation` property (`oneOf`), but if we merge those objects recursively we would end up having both of them specified at once.

On the other hand if we just do a "shallow" merge (assign a new value for `media` object), we'd lose other properties of `media` like `width` and `height`.

There are of course ways to solve this, worst case scenario we can probably just do away with shallow merge, forcing the updater to specify the entire `media` object even if only wanting to update a single value of a deeply nested property.

We could also enforce that, in a scenario like this, the update should explicitly set `httpMediaLocation` to `null` and `joystreamMediaLocation` to a valid object. Then on the json-schema level we would require that either `httpMediaLocation` or `joystreamMediaLocation` is always set to `null` (this should also be possible to do with `oneOf`).

Another important thing about the updates is that **metadata updates have no effects on smart contracts themselves**, which means the query node can safely ignore any invalid update (and at best - just store an error log).

#### Contract address

Since migrations and upgradability is not the focus of the initial implementation, the query node (processor) can probably just import a static `ContentDirectory` contract abi and use a contract address provided in `ENV` (there is also one other possible approach further described in the `CLI` section below).

Ultimately (in the future) depending on the upgarade strategy we choose it may either need to be able to switch this address to a different one on some event (ie. `Upgraded`/`Migrated`), but it may also be the case that the contract address remains unchanged even when the contract is upgraded (the proxy-pattern).

## `@joystream/cd-schemas` library

In the scenario described above (in the `Query node` section), the role of this library would be reduced to just storing `ChannelMetadata` and `VideoMetadata` json schemas and the TypeScript interfaces auto-generated from them.

That would mean removing a lot of functionality related to conversion between one schema representation and the other (ie. runtime schemas to json schemas), scripts related to content directory initialization, `InputParser` that handles parsing json objects to `Transaction` operations etc.

I think in that case it may make sense to just replace it with one consumable library called `@joystream/content-directory` that stores everything related to the new, smart contract content directory implementation, namely:
- the actual smart contracts
- smart contract deployment and migration scripts
- smart contract build artifacts (`json`) generated by truffle (they contain `abi`s, addresses of the deployed contracts and everything else the other projects may need)
- types generated via typechain (based on smart contracts code)
- Channel/video metadata json schemas
- Channel/video metadata TypeScript interfaces (generated from json schemas)
- perhaps some utilities to simplify contract method calls from the frontend

## CLI

The CLI is currently completely independent from the query node and initially I thought that it would be best to keep it this way, but due to the way channel/video metadata is handled by smart contracts, such integration may actually make sense (this is further explored at the end of this section). 

There is a set of commands like `createChannel`, `uploadVideo`, `curateContent`, `createCuratorGroup`, `addCuratorToGroup` etc. already implemented for Babylon release and it will probably not be very hard to update them to work with smart contracts instead of the `contentWorkingGroup` runtime module.

Some content-directory specific commands like `classes`, `removeEntity`, `createClass`, `addClassSchema` can be removed completely.

The integration with metadata json schemas should be quite straight-forward, since the CLI already uses similar approach. 
There is a class that allows us to prompt for data based on json schema, called `JsonSchemaPrompter`.
It can be used to provide a convinient (as far as CLI's are considered) UI for inputing the metadata and validating it in realtime using `Ajv`. That means there also shouldn't be a need to change the code inside the CLI itself when the underlying metadata json schema changes.

#### Extrinsics / calls

Calling a contract method "through the runtime" is a rather simple process, here is a sample script that does this:

```
import ContractJson from '../build/contracts/ContentDir.json'

const CONTRACT_ADDRESS = ContractJson.networks[43].address

const web3 = new Web3(); // We can use a web3 without a provider in this case

const provider = new WsProvider('ws://127.0.0.1:9944')
const api = await ApiPromise.create({ provider })

const contract = new web3.eth.Contract(ContractJson.abi);
const call = contract.methods.removeVideo(1).encodeABI()
const substrateTx = api.tx.evm.call(
	CONTRACT_ADDRESS,
	callData,
	0,                          // msg.value
	4294967295,                 // gas limit
	1,                          // gas price
	null                        // nonce (optional)
)
```

_We can leverage `TypeChain` to decorate `contract.methods`, making them fully typesafe._

The only issue here is that **executing such call has a gas cost, which the user has to pay from his "Ethereum balance"** (since each account has two separate balances as described in the EVM pallet documentation: https://substrate.dev/docs/en/knowledgebase/smart-contracts/evm-pallet), so we would either need to provide the user with commads to deposit and withdraw to/from his ETH balance and also possibly check this balance at the beginning of the execution logic of commands like `createChannel` (at least to warn if it's empty?) OR perform this conversion "automatically" (by either estimating the gas cost or temporarly using some some predefined value to convert before `call` - for example: `1 JOY`)

#### Reading data directly from the contract (ie. public variables)

As described in the first section, I think for this we would actually need `rpc-ethereum` on the runtime side
and `Web3.js` on the frontend, otherwise we would have to read and decode raw hex data from contract storage slots,
which is quite complex, very error-prone (highly dependent on the solidity code itself) and hard to implement, ie.:
```
    const slot = "0";
    const mapStorageSlot = slot.padStart(64, '0');
    const mapKey = bobEvmAccount.toString().substring(2).padStart(64, '0');

    const storageKey = web3Utils.sha3('0x'.concat(mapKey.concat(mapStorageSlot)));
    const accountStorage = (await api.query.evm.accountStorages(contractAddress, storageKey)).toString();
    /* ... */
``` 
_An example of recieving a value from a mapping in an example ERC20 contract (https://github.com/paritytech/frontier/blob/master/template/examples/contract-erc20/create-erc20.ts)_


With web3 we can achive the same with just a single line of code like:
```
const data = await contract.methods.accountTokenBalance(account).call();
```

#### Displaying video/channel metadata within CLI

For this relying on query node actually seems unaviodable, since smart contracts would not actually store this data (instead it is just emitted in an event) so the CLI would need to do the same work as query node does in order to establish current values (ie. find related `CreateChannel` event, then all `ChannelMetadataUpdated` events and apply those subsequent updates).

This means the CLI can either:
- only display information it can fetch from smart contracts. For `Channel` that would be: `id`, `owner`, `status` (active/censored), for videos: `id`, `channelId` and `status`.
- use query node to query `Channel` and `Video` metadata (this may have some value, although it's already very easy to just query the query node directly for this via the exposed web UI)

#### Handling changing contract address(es) and abi(s)

When Truffle contract migrations (deployemnts) are run, the framework creates/updates "build artifacts", which are json files containing a lot of useful information about the contracts, ie.: the current address of each of the contracts depending on `network` (ie. `development`, `test`, `live`) and contract `abi` that both CLI and query-node need in order to be able to decode the events, encode calls etc.

We probably don't want to commit build artifacts themselves in our repository, but if we assume we always use truffle for deployemnts and the CLI imports current build artifacs like this:

```
import ContractJson from '@joystream/content-directory/build/contracts/ContentDirectory.json';

const CONTENT_DIRECTORY_CONTRACT_ADDRESS = ContractJson.networks[networkID].address
``` 

We wouldn't need to set anything manually after (re)deploying the contracts to a development node locally / during tests.

If the published version of `@joystream/content-directory` then contains build artifacs with addresses of deployed production contracts (which means we would of course have to deploy contracts first and then publish the library), the users of the CLI also wouldn't need to manually set them after they install the CLI.

In that case the CLI can of course still allow the users to customize those addresses via commands like `content-directory:setLogicContractAddress`, `content-directory:setChannelStorageAddress` etc.

## Integration tests

During the integration tests we would need to ensure a few additional steps are executed before any interaction with smart contracts will be possible:
- contracts are deployed (ie. using `truffle migrate` or a custom deployment script if no web3 support is available)
- bridge contract addresses are set in the runtime via sudo (**this should be executed before any membership module extrinsics, unless we want to always do the export&initialization part described in `Deploying the contracts` section**)
- addresses that send `evm.call` extrinsics need to have enough evm-balance to pay gas fees (this balance can either be initialized with `evm.deposit` or via genesis config)

Besides that I suspect the effects on current integration tests would be similar to those on the CLI, mostly we would need to replace all `api.tx.contentDirectory.transaction(...)` extrinsics with corresponding `api.tx.evm.call(...)` extrinsics.

The tests are already importing `Channel`/`Video` entity typescript interfaces from `@joystream/cd-schemas` and using them for `Channel` / `Video` creation and updates (the library exposes a converter that parses such inputs into `Transaction` extrinsc input). This means that the new format of input objects used to create / update `Channels` and `Videos` (the `metadata` object) will not be very different from the one that is currently used (for smart contract methods we would also provide objects like:`{ title: 'Video', description: 'An example video', media: { width: 800, height: 600 } }`).

Current contracts addresses can be established within network tests similarly to how it was described in `Handling changing contract address(es) and abi(s)` section of `CLI`, assuming the CI would also use the standard `truffle deploy` flow, which would update the `json` build artifacs inside `@joystream/content-directory` in the monorepo.

In the early phase (before all runtime changes are included) there will be a separate set of unit-test tests designed specifically for smart contracts (for example, see: https://github.com/Joystream/joystream/issues/1681) that can be run against a Ganache node and don't require any interactions with substrate runtime. It may be possible to merge some of those tests into our integration tests (that would depend on whether we would support web3 rpc), but keeping them separate also seems quite valueable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart contracts content directory - implementation plan #1816

Runtime

We would need:

Deploying the contracts

Query node (and general event/metadata handling)

Video/Channel metadata

Handling metadata on channel/video creation

Handling channel/video metadata updates

Contract address

`@joystream/cd-schemas` library

CLI

Extrinsics / calls

Reading data directly from the contract (ie. public variables)

Displaying video/channel metadata within CLI

Handling changing contract address(es) and abi(s)

Integration tests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Smart contracts content directory - implementation plan #1816

Description

Runtime

We would need:

Deploying the contracts

Query node (and general event/metadata handling)

Video/Channel metadata

Handling metadata on channel/video creation

Handling channel/video metadata updates

Contract address

@joystream/cd-schemas library

CLI

Extrinsics / calls

Reading data directly from the contract (ie. public variables)

Displaying video/channel metadata within CLI

Handling changing contract address(es) and abi(s)

Integration tests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`@joystream/cd-schemas` library