diff --git a/badal/dsl/README.md b/badal/dsl/README.md new file mode 100644 index 0000000..2c261b0 --- /dev/null +++ b/badal/dsl/README.md @@ -0,0 +1,288 @@ +# The Badal DSL + +_This is an early-stage, incomplete, and unreviewed draft. Anything can change at any time_ + +## Introduction + +The idea of Badal.DSL is to provide users with a high-level language +to specify schemas. This not only would make it easy for use to +quickly implement different scehmas for demos and PoCs, but more +importantly, it would make it easier for someone to understand schemas +in Badal by abstracting away the low-level details in badal.schema and +allowing them to focus on the high-level concepts and "business logic" + +The "DSL" is really just a set of Python classes which use +metaprogramming to generate/call the underlying badal.schema code +based on a high-level description provided via the classes and +attributes of the DSL. This is similar to the way Django ORM abstracts +away the low-level RDBMS code by using the `Model` class and related +classes. + +## Basic DSL Classes + +The DSL is primarily based on these classes: Attribute, State, Claim, Proof, Transaction, and Schema. + +The Attribute and State classes abstract away the details of +`add_attribute_type` and `add_state_type` from `badal.schema`. Here is +an example of a `State` definition using the DSL (with only a few +important data-members shown): + + :::python + class Utxo(dsl.State): + owner_id = dsl.PublicID(scheme='g16') + amount = dsl.Amount(uom='inr', precision=3) + +This code will result in two `add_attribute_type` calls and one +`add_state_type` call. + +The `Claim` class allows the specification of ZKP code that needs to +be incorporated in the proof for the transaction. All the claims of a +transaction together should guarantee that the transaction is valid. + +The `TransactionCore` dataclass contains data-members specifying the +input state types and output state types for this transaction. + + :::python + @dataclass + class TransferCore(dsl.TransactionCore): + inputs = dsl.Array(Utxo, type="input", max_length=2) + outputs = dsl.Array(Utxo, type="output", max_length=2) + # note: these are only input and output states + # no other attributes are allowed in a TransactionCore + + +The `Transaction` class puts it all together by extending the +`TransactionCore` by adding a more data-members (specifically, the +data-members that are not incorporated in the signatures) and methods. +Here is an example of a `Transaction` definition (with only a few +important data-members shown): + + :::python + class Transfer(TransferCore, dsl.Transaction): + # inherits the data members of TransferCore + claims = (AmountsMatchClaim, UtxoTypesMatchClaim) + signatures: list[Signature] + input_hashes: list[StateHash] + output_hashes: list[StateHash] + proof: Proof + +One important function of these classes is to automatically create code in the chosen ZKP language (_e.g._ Zokrates or circom) to create appropriate data-structures for each attribute type, for each state type, and for the transaction core. They also create functions to initialize and manipulate these data-structures. For example, when using Zokrates, this would result in the creation of `struct owner_id_t`, `struct amount_t`, `struct utxo_t`, and `struct transfer_core_t`, along with functions `init_owner_id`, `init_amount`, `init_utxo`, and `init_transfer_core`. + +## Details of `State` + +A state has the following important properties: + +- method: `hash() -> StateHash` returns a hash of the contents of the state + - The hash function used will be collision resistant, preimage + resistant, and second preimage resistant + +- method: `owners() -> list[PublicId]` + - This method returns a list of values of attributes in this state + that represent _owners_ of this state. The signatures of all of + these are required for a transaction to be able to cancel this + state. + - Note: every state must have at least one owner + - Note: there can be more than one owners + - + - TODO: we should also allow k-of-n ownership, but the details of + how to do that need to be worked out + - Note: It is not necessary that all the `PublicId` attributes in a state + are owners. Specifying `non_owner=True` in the attribute definition + indicates that this is a `PublicId` attribute but does not represent an + owner and thus need not sign a transaction canceling this state + +- TODO **nonce**: we need a mechanism to ensure that no two states can + result in the same `StateHash`. Because of the properties of the + hash function this can only happen if all the values of the + attributes in the state are the same as that of another state. + - Note: This can be easily guaranteed by embedding a GUID in + each state. However, we need to be careful with stuffing + data members in states because each extra byte in the state + increases the cost of ZKP generation + - What is the smallest nonce we can include in a state to guarantee + uniqueness? + - Specifically, a GUID or a timestamp might actually not provide + strong guarantees of preimage resistance and we might necesssarily + need a cryptographic nonce? + - Does each state contain a timestamp? We should avoid this if possible + because it makes proofs more expensive without really giving us + a cryptographically safe nonce + +- At the end of the transaction, the entire contents of each output + state (including the nonces) need to be sent by the transaction + creator ("sender") to the state owners + +## Details of `Transaction` + +At the wallet provider, the Transaction class has the following metadata: + +- TransactionCore: All the data-members from the transaction core, + which includes: + - input data-members: In the example above, there is just one data-member + `inputs` which has all the input states because they are all of + the same type (`Utxo`). However, it is possible to have multiple + data-members with different state values. Each represents a state being + canceled. + - output data-members: Similarly, one or more data-members representing + the `outputs`. + - whether a state represents an input or an output is deduced from the + type declaration + - method: `hash() -> TCHash` returns a hash of the contents of the + `transaction_core` + - The hash function can be the same as that used for `State.hash` + - The `StateHash` and `TCHash` types can be the same. + +- claims: Tuple of names of classes representing the claims being made in this + transaction. Each claim class will contribute code to the ZKP program + for this transaction + +- method: `get_owners(creator: PublicId) -> list[PublicId]` + - a method returning a list of `PublicId`s whose signatures are + needed for the transaction to be valid. The `0`th element + of this list is guaranteed to be the transaction creator. + - The reason for special treatment for the transaction creator + is that we don't need a signature for the transaction creator. + A direct proof of knowledge of the private id is good enough + for the creator since the creator's private id can be provided + to the ZKP program as a private input. The same cannot be done + for the other owners. + - If there are duplicates among the owners of the different input + states, then this method will remove duplicates + - TODO: the details of how this works needs to be worked out + +- method: `get_zkp_program() -> str` + - a method returning the (automatically generated) ZKP program + corresponding to this transaction + - See section `Proof` for details of what this ZKP program contains + - the `proof` for a specific transaction instance will be the result + of running this program with the private data of that transaction + +- method: `get_transaction_core() -> TransactionCore` + - returns an instance of the `TransactionCore` populated with the + actual input and output state instances. The `hash` of this + transaction core is signed by the owners for validating a transaction. + This transaction core and the corresponding signatures are the + private inputs to the ZKP program + - TODO: it should return `TransferCore` not `TransactionCore` + Need to doublecheck that this doesn't cause any problems. + +- method: `get_zkp_program_inputs() -> list[str]` + - a method returning the commandline arguments to be provided to the + `get_zkp_program()` + - Question: should this be json instead of `list[str]`? + - At a minimum, this includes the `transaction_core` and `signatures` + and `creator_private_id` as the private inputs and `input_hashes` and + `output_hashes` as public inputs + - Question: does this _have_ to be overridden by any subclass of the + `Transaction` class? Probably not: so this method can be common and + moved into the `Transaction` class. + +## Details of `Signature` + +TODO: we need to figure out what signature scheme we're using + +The `get_owners` method has the following properties: + +- The `get_owners` method creates a list of `PublicId`s whose + signatures are needed for a transaction to be valid. +- All the `owners` will only come from the `inputs`. The `outputs` do + not contribute to the `owners`. +- Each `input` state might contribute 0, 1, or more `PublicId`s to the + list. It is possible that two different states have the same owner + so they contribute only one unique `PublicId`. It is possible that some + state might contribute two or more. + - TODO: We should extend this to allow more complex things like + 2-of-3 signatures +- The 0th element of this list is the transaction creator, who does not + provide a signature (as discussed earlier) + +The `signatures` list has the following properties: + +- The `proof` will construct the owners array and prove that every + owner in every state exists in the owners array +- The `proof` will `assert HMAC(creator_private_id, 0) == owners[0]`. +- The `proof` will also assert that + `verify_signature(transaction_core_hash, self.get_owners()[i], + signature[i])` returns true for all i > 0. Note: the proof only + verifies the signature, so it only needs to know the `PublicId`s + of each owner, not the `PrivateId`. +- All signatures will have to be collected via out-of-band means. + Badal.DSL does not have a way of creating the signature of the + transaction. Typically, Badal code will output the `transaction_core` + and the `get_owners()` list. The user has to generate/collect the + corresponding signatures and input them into the system. As a result, + the `PrivateId` is never directly used in the ZKP program. This allows + a transaction creator to collect signatures of other parties + whose `PrivateId` is not known/revealed to the transaction owner. + + The wallet provider will usually have methods to save private ids + and sign transactions, but for now we'll assume that is a separate + library. + - For now, we're assuming that it is the job of the transaction + submitting wallet provider to collect all the signatures via + out-of-band methods + +## Details of `Proof` + +The ZKP program auto-generated for a transaction class has the +following components in it: + +### Code common to all transactions: + +- compute the `StateHash` for each input and and assert that it is + equal to the corresponding `input_hashes[i]`. +- do the same for `output_hashes` +- construct the owners list +- for each owner in each state assert that it is included in the owners list +- for each i > 0: assert that `verify_signature(transaction_core_hash, + owners[i], signatures[i])` returns true +- assert `HMAC(creator_private_id, 0) == owners[0]` +- assertions related to the chaining pointers if necessary (see + comment at Notary) + + +### Code specific to this transaction type: + +- Code for each `Claim` in the transaction + - each claim gets the `transaction_core` as input and the `Claim` + definition contains the code to extract the relevant information + from the `transaction_core` and assert the appropriate constraint + - remember, the same `transaction_core` that is used in the claims + is also used in computing the `StateHash`es, thus ensuring validity + of the transaction + +## Notary + +At the notary, one transaction has the following data: + +- inputs: `list[StateHash]` corresponding to states being canceled + - Note: `StateHash` will always be unique. See discussion under + `State` for more details of this. +- outputs: `list[StateHash]` corresponding to states being created +- TODO: chaining pointers: + - Additionally, the transaction on the ledger probably needs + to include state hashes of earlier states or hashes of earlier + transactions for chaining purposes. The chaining would help + maintain integrity of the ledger. The details of this need to + be worked out. + - TODO: Figure out whether the chaining is done by the wallet provider + or the notary or both +- proof + +The notary does the following: + +- Confirm that `inputs` represent active states on the ledger, meaning + that for each input: + - confirm that it exists in the `outputs` of one and only one of the + previous transactions, and + - and it does not exist in the `inputs` of any previous transaction +- Confirm that `outputs` represents new states: + - this means confirming that none of the outputs exist in + the inputs or outputs of any of the previous transactions +- Verify the proof +- Add a timestamp to the transaction +- Hash the transaction and use that as a `transaction_id` +- Sign the transaction + +The signed transaction is appended to the ledger and the +`transaction_id` is returned to the caller diff --git a/badal/dsl/schema.py b/badal/dsl/schema.py new file mode 100644 index 0000000..26f779b --- /dev/null +++ b/badal/dsl/schema.py @@ -0,0 +1,194 @@ +# WARNING: Obsolete. This code is not consistent with the +# latest thinking. See badal/dsl/README.md for the +# latest thinking + +from __future__ import annotations + +from typing import Any, TypeGuard + +from badal.schema.attribute_types import AttributeType, Visibility + +class Attribute: + uri: str = '' + id: str = '' + name: str = '' + + def __init__(self, + required: bool = False, + visibility: Visibility = Visibility.Private, + **kwargs): + self.attribute_type = AttributeType( + self.uri, self.id, self.name, visibility) + + def add_to_statemeta(self, cls, name: str) -> None: + '''Register attribute with `name` on the `cls`''' + self.name = name + self.state_class = cls + cls._meta.add_attribute(name, self) + + def to_python_obj(self): + '''return in-memory representation of this attribute value + + i.e. return a Python object this value + ''' + ... + + def to_json(self) -> str: + ... + + +def is_attribute(attr_value: Any) -> TypeGuard[Attribute]: + '''Check if attr_value is an Attribute + + Note: in Django, the similar check isn't isinstance(a, Field) + They do it using hasattr(v, 'contribute_to_class'). + We could have done hasattr(v, 'add_to_statemeta') which is + technically more pythonic, but I think that would unnecessarily + add complexity to already complex code + ''' + return isinstance(attr_value, Attribute) + + +class State: + + + ... + + +class ZKPSystem: + def + + + +class Claim: + '''A claim represents a guarantee regarding a transaction + and is specified via code in the ZKP language being used + + Conceptually: A claim is the compile time information describing + what needs to be proved. At runtime a claim accesses the actual + (private) data of the transaction to produce a proof. + + Actually, what happens is this: At compile time, the code for the + claim gets included in the ZKP program associated with the + transaction. Thus, the ZKP program consists of: 1. some + initialization code to take the private and public inputs + and put them in a common structure that the other parts of + the proof can access, 2. the code for each of the claims in + the transaction which will access the common data from #1, + 3. code for claims that are common to all transactions + (for example, computing the hashes of all the input and + output states and asserting that they match the corresponding + public hashes on the ledger), and 4. a "main program" which + calls all this code. + + + Thus, at runtime, when the ZKP program is executed to produce + the proof, automatically the code for every claim gets executed + and thus the proof proves all the claims in the transaction. + ''' + ... + + +class Proof: + '''A proof is the output of running the ZKP program. + + For more details see the comment at Claim + ''' + ... + + +class ZokratesClaim(Claim): + Array: type + MultiArray: type + ... + + +class StateHashesMatchClaim(ZokratesClaim): + function_name = 'state_hashes_match' + code = ''' + def state_hashes_match(TxnBinary txn_binary, + field[2][N] input_hashes, + field[2][M] output_hashes): + for u32 i in 0..N do + ppl_signature(txn_binary.state[i]) + endfor + ''' + + +class ZokratesProof(Proof): + ... + + +class Signature: + ... + + +class Transaction: + ''' + Every transaction has a list of input states (which will be caceled), + a list of output states (which are being created), and a + list of claims (for which proofs need to be submitted) + + In addition to the listed claims, every transaction has an implict + ''' + ... + inputs: list[State] + outputs: list[State] + + claims: tuple[Claim] + signatures: list[Signature] + + def create_txn_binary_type(self): + ''' + Create transaction binary datatype + + For example, for Zokrates, this creates the TxnBinary struct + ''' + + def create_txn_binary(self): + ''' + Creates transaction binary data which can be passed to + the individual proof functions + ''' + + def create_ZKP_program(self): + ''' + Create the ZKP program corresponding to this transaction type + + This is a compile time operation + + This creates the following pieces: + 1. Defines the "type" (if any) for the whole transaction type. + This could involve defining the types for each + individual state type. + 2. Create the function which takes all the state data items + and initializes an instance of the transaction type. This + could invilve defining the init functions for each state + type + 3. Create the functions for each claim of this transaction + 4. Create the main function which takes all the inputs + and calls the functions in #2 and #3 + ''' + ... + + + + ... + + +class Action: + ... + + +class Spec: + uri: str + name: str + version: str + contract_model: str + proof_model: str = 'Zokrates' + depends_upon: list[Spec] = [] + + attributes = list[Attribute] + transactions = list[Transaction] + # attrs, states, transactions, actions + diff --git a/badal/zkp/README.md b/badal/zkp/README.md new file mode 100644 index 0000000..8702c8a --- /dev/null +++ b/badal/zkp/README.md @@ -0,0 +1,300 @@ +# Zero-Knowledge Proof Systems in Badal + +_This is an early-stage, incomplete, and unreviewed draft. Anything can change at any time_ + +This document assumes that you're familiar with the contents of [badal/dsl/README.md](../dsl/README.md). + +## Introduction + +A core mechanism in Badal is that most of the data in a transaction stays private (known only to the entities that participate in the transaction) and only hashes and some other metadata are stored on the ledger. To guarantee validity of all the transactions and prevent double-spends, zero-knowledge proofs are used by the transaction submitters (wallet providers and end users) to provide the appropriate guarantees. + +Badal is intended to be a flexible system that supports different various different ZKP protocols (zk-SNARKs vs zk-STARKs vs BulletProofs), ZKP languages/libraries (_e.g._ Zokrates vs circom), proof systems (_e.g._ Groth16 vs Marlin), and and other configurable implementation details of those proof systems (_e.g._ ALTBN_128 vs BLS12_381 curve, or SHA256 hash vs Poseidon hash). + +For the rest of this document, we will use the term ZKP parameters to refer to all the different configuration options where multiple choices are possible. + +Any one schema in Badal is required to use only one configuration of the ZKP parameters. This is specified once at the time of schema creation and cannot be modified after that. (TODO: figure out what it takes to support modification of ZKP parameters. For example, what happens if a vulnerability is discovered in one of the hash functions we're using? How does a schema recover from this?) + +## Building Blocks + +For the ZKP system in Badal, we need to decide on the following key building blocks: + +- One hash function: to hash states +- Three digital signature systems: one for notary to sign transactions being put on the ledger, another for a transaction creator to sign the transaction, and the third for the other owners in a transaction to sign the transaction. (The list second and third could be combined) +- Ability to create ZKP programs and proofs: this requires deciding the ZKP protocol, the high-level ZKP language/library, the ZKP system to be used, and other ZKP parameters + +These are described in detail in the following sections. + +### Hash Function + +Badal requires the use of a hash function which is collision resistant, preimage resistant, and second preimage resistant. This is used in hashing `State`s to get `StateHash`es. The `State` data is private data which is only saved by the wallet providers or end-users while the `StateHash` is publicly stored on the ledger. + +The hash function needs to be ZKP friendly. _i.e._ the time and space requirements for implementing the hash function in the ZKP system picked need to be reasonable. It is likely that which hash functions are acceptable is conditional upon which ZKP language/library and which ZKP system are chosen. + + +Supported Hash Functions + +| Functions | Languages/Libraries | ZKP Systems | Comments | +|-----------|---------------------|-------------|-------------------------------------------------------------| +| SHA256 | Zokrates, ?? | All?? | Used in ZCash with an optimized circuit but still expensive | +| Pedersen | ?? | ?? | ?? | +| Poseidon | ?? | ?? | ?? | +| MiMCHash | ?? | ?? | ?? | + +TODO: Fill in the `??` in this table. Also, add more rows? + +### Digital Signatures, Public ID, Private ID + +Badal uses two different types of digital signatures: standard digital signatures and ZKP digital signatures. + +A standard digital signature is used by a Notary to sign a transaction +being appended to the ledger. In this case, the "message" is the +public fields of transaction (i.e. the `StateHash` values and other +public fields) to be signed by using the Public/Private ID of the +Notary. The message, the signature, and the Public ID of the notary +are publicly known. + +A ZKP digital signature is used by a wallet or entity to provide a ZKP +to the Notary that a transaction is approved by the entity. Usually, +this would involve proof of knowledge of the Private ID associated +with the Public ID of the wallet or entity. (Question: it is possible +to have a different system?) + +In both cases, picking a digital signature system automatically +determines the formats of the Public and Private IDs + +#### Standard digital signature system + +Every Notary has a well known Public ID and a secret Private ID. All +transactions appended to a ledger by a notary are digitally signed by +the Notary using its Private ID. + +The system must satisfy the standard guarantees of digital signatures: + +- It should be possible to verify the validity of a signature given + just the document and the Public Id +- It should not be possible to sign a document without knowing the Private ID +- It should not be possible to extract the Private Id from a large set + of messages, signatures for a given Public Id +- It must be resistant to an existential forgery attack. + +Note: this digital signature system does not have to be ZKP friendly + +TODO: Decide which system(s) will be supported by Badal. + +#### ZKP digital signature system + +In Badal, a `PublicId` is used in `State`s to refer to entities or +wallets. Note: although it is called a `PublicId`, usually it will be +a part of the _private_ information in a state/transaction which is +known to all the transaction participants but is not put on the +ledger. + +Every transaction in Badal has a set of "owners", referred to by their +Public IDs, and for a transaction to be considered valid, we need +proof that each owner approves of this transaction. + +_Discussion_: In a ZKP system it is possible for a transaction creator +to signal their own approval for the transaction without actually +using a standard digital signature system. The ZKP that the +transaction creator submits to the Notary along with the transaction +can contain a proof of knowledge of the Private ID of the transaction +creator. This can be done directly by providing the Private ID as one +of the private inputs to the ZKP program and in the program proving +that this Private Id is connected to the Public Id of the transaction +creator. (For example, this could be done by `assert HMAC(private_id, +0) == public_id`.) + +However, this process fails when a transaction has multiple owners +whose approval needs to be proven. It is not possible for the Private +IDs of the other owners to be provided as private inputs to the ZKP +program (because then they would become known to the transaction +creator). To get around this problem, the transaction creator creates +the `transaction_core` data structure, and sends it to the other +owners for their signatures. Each owner computes the hash of the +`transaction_core`, signs this hash, and sends the signature back to +the transaction creator. The `transaction_creator` collects all these +signatures and and attaches them along with the transaction as public +inputs when submitting it to the Notary. In addition, the ZKP program +contains code to verify that each of these signatures is a valid +signature (this step does not require knowledge of the Private IDs). + +Should the transaction creator provide a "direct proof" as described +in the first paragraph, or should the transaction owner be treated as +just another owner and its signature included in the list of +signatures that are public inputs? The advantage of the former is that +(most likely ) it would be more efficient than the latter. The +advantage of the latter is consistency and simplicity of the code? + +TODO: Decide whether the transaction owner will use a "direct proof" +or a signature. Remember, it is possible that single-owner +transactions will be the most common case in many schemas and hence it +might be worth optimizing this case. + +The ZKP digital signature system (used for non-transaction creator +owners) must satisfy all the same properties as those for the standard +digital signature described in the previous section. In addition it +should also satisfy this: + +- The signature verificaction algorithm of this system must be ZKP friendly + +TODO: What signature system is ZKP friendly? Specifically this needs +to be a standard digital signature scheme i.e. it should be possible +for a third party to verify the validity of a signature knowing just +the public key, and this verification algorithm must be ZKP friendly. + +Due to this requirement is it possible that the ZKP digital signature +system is used is not the same as the standard digital signature +system. + +If we're using the "direct proof" method for the transaction creator then +it has a different set of requirements. Specifically, this method is +equivalent to having a HMAC keyed on the Private ID, with the +following properties: + +- It should be collision resistant and preimage resistant +- It must be ZKP friendly + +TODO: If we are using the "direct proof" method, then decide which +HMAC is used for this + +TODO: Is it necessary for the "other owners" to sign the full +transaction? Can we get away by having them just sign the inputs and +outputs in which they're involved? + +## ZKP Protocols + +At this time, the following three are the most common ZKP protocols +with practical implementations: zk-SNARKs, zk-STARKs, and +BulletProofs. (TODO: keep adding to this list. Should Aurora be here?) + +As per [this +article](https://ethereum.stackexchange.com/questions/59145/zk-snarks-vs-zk-starks-vs-bulletproofs-updated), +it appears only zk-SNARKs provide the performance required by Badal. Specifically, zk-STARKs proof sizes are too large (~45KB) and BulletProofs require 1+s to verify proofs. + +TODO: update this as and when new information becomes available. + +TODO: decide whether BulletProofs are really unacceptable? + +Decision: For now, we only support zk-SNARKs. + +## ZKP Languages / Libraries + +The following are high level languages or libraries that can be used +to quickly and easily write ZKP programs: + +- [Zokrates](https://zokrates.github.io/) +- [circom](https://docs.circom.io/) +- [snarkjs](https://github.com/iden3/snarkjs) +- [arkworks](https://github.com/arkworks-rs) +- Rejected: [Cairo](https://www.cairo-lang.org/): license is not open source + +Note: all the languages/libraries we support must have an appropriate open source licence. + +Decision: We will certainly be supporting Zokrates. Circom also looks promising. + +TODO: evaluate snarkjs and arkworks for suitability + +### Low level zk-SNARK libraries? + +Should we be looking at low level zk-SNARK libraries? Ease of creating new schemas is one of the design goals, and that includes ease of writing ZKP programs. That seems to argue against lower level libraries. + +Potential low-level libraries to look at, if we choose to do so: [Libsnark](https://github.com/scipr-lab/libsnark), [Bellman](https://github.com/zkcrypto/bellman), [Gnark](https://github.com/ConsenSys/gnark). + +## ZKP Protcols + +This decision will be conditional upon the choice of the ZKP language/library. + +Zokrates supports: Groth16, GM17, Marlin, PGHR13. + +Circom supports: Groth16, PLONK. + +[Groth16 seems to be fastest](https://github.com/scipr-lab/libsnark/blob/master/libsnark/zk_proof_systems/ppzksnark/README.md). But [Groth16 has the strongest cryptographic assumption](https://eprint.iacr.org/2016/260.pdf), meaning. Most widely used cryptographic assumptions in non-ZKP systems are Hardness of Factoring (RSA) and Discrete Log (El-gamal encryption, TLS) but Groth uses a stronger assumption, relatively nonstandard. + +Of this PGHR13 is primarily for historical reasons, because it is the first paper and easy to understand? (TODO: confirm this?). + +Marlin and PLONK have the big advantage that is it "Universal" and would thus not require a per ZKP program setup phase. But ar they too new? + +TODO: Create a comprehensive table of the assumptions, advantages, and +disadvantages of each system + +TODO: What are the advantages and disadvantages of GM17? + +TODO: Decide if Marlin is safe enough to use. Decide how to decide this. + +TODO: Decide if PLONK is safe enough to use. Decide how to decide this. + +TODO: Is PLONK a zk-SNARK or something different? + +### Notes on cryptographic assumptions + +TODO: How should we decide which cryptographic assumptions are acceptable for Badal and which ones aren't? + +Should we decide based on ‘oldest and most widely used’ assumption with the largest security-parameter bit length? Or should be do it based on which assumption has had the largest amount of $$ riding on it so far? + +Hardness of Factoring: So use RSA with 512 bits security-parameter? But I think the ‘default’ libsnark field is 254 bits and cannot be used for proof of pre-image of RSA public key with 512 bit fields.. This suggests that if we use non-default higher bitlength fields within libsnark the proof overheads may get very high? + +There probably are no practical ZKP systems whose underlying cryptographic assumption is Hardness of Factoring + +Discrete Log: El-gamal encryption and TLS security relies on this. Not sure + +## Other ZKP Parameters. + +TODO: Figure out the pros and cons of using different curves. + +## Operational Considerations + +TODO: For each combination of ZKP parameters we need to figure out +what is the impact on processes to be followed at various stages in +the Badal lifecycle + +These are the important milestones in the Badal lifecycle: + +- Initializing Badal +- Adding a notary +- Adding ledger +- Adding a schema +- Submitting a transaction +- Creating an _ad hoc_ ZKP program and proof + +For example in Zokrates with Groth16: + +- Initializing Badal: At this time we would need a problem/Circuit + independent Trusted Setup (_i.e._ the perpetual powers of tau + ceremony). Would this require an MPC computation that could be + performed by hundreds of finance ministry personnel? +- Adding a notary: ?? +- Adding a ledger: ?? +- Add a schema: TODO: Trusted setup would need to be repeated for each + ZKP Program in the schema? How will this be done exactly? How are + variable number of inputs/outputs handled? Will the trusted setup + have to be repeated for each unique combination, at runtime? +- Submitting a transaction: Nothing special? +- Ad hoc proof: TODO: Trusted setup would be repeated? Who runs it? + How are the prover key and verifier key distributed? + +TODO: repeat with: Zokrates+Marlin, circom+Groth16, circom+PLONK. + + + +# Appendix + +## Rejected Ideas + +### Other Owners directly submitting ZKPs to the Notary instead of a designated transaction creator + +One possibility discussed was that of all the owners providing the +full ZKP for the transaction and these proofs going to the Notary. I +believe this would involve a lot of duplication of proof-generation +effort (i.e. each owner independently proving that the sum of input +amounts is the same as the sum of output amounts). Hence, this option +doesn't seem worth doing. + +### Other Owners Using ZKP instead of digital signatures + +One simplification of this scheme is possible: that the transaction +creator creates a full ZKP for the transaction, while the other owners +create a ZKP of just the fact that that they (a) approve of the +transaction core and (b) they know the private id associated with +their public id. To me, prima facie it seems that this will either be +more expensive than the digital signature based scheme, or it will end +up being equivalent to it. diff --git a/docs/design_rationales.md b/docs/design_rationales.md new file mode 100644 index 0000000..73196f5 --- /dev/null +++ b/docs/design_rationales.md @@ -0,0 +1,96 @@ +# Design Discussions and Rationales + +This document has sections exploring the major design alternatives, including summaries of the trade-offs, links to references with more information, and implications for PPL if any. + +As and when we actually take decisions, this will be updated with the rationale for our choice. + +## UTXO Model vs Account Model + +The two important models of organizing the information on a blockchain: the UTXO model used by Bitcoin and the account model used by Ethereum. [This article][horizen] gives a good overview of both, what the UTXO and account models are, and the architectural tradeoffs involved. + +In the UTXO model, each transaction has one or more inputs which indicate which specific coins are being spent (they have to be unspent outputs of earlier transactions, hence the name unspent transaction outputs) and has one or more outputs indicating how many coins are being given to which addresses. Once this transaction is confirmed, the outputs of the transactions become new UTXOs and the inputs are no longer UTXOs because they are no longer unspent. Note that the total amount owned by any particular address is not stored/maintained by the blockchain, and is not needed to validate any transaction—just ensuring that the inputs are unspent is good enough. Clients who want to know their total balance need to compute and maintain the balance themselves. + +In the account model, the system maintains the current balance in each account. Each transaction indicates the amounts to be deducted from the input accounts and amounts to be added to the output accounts. The validation of a transaction involves checking that the input accounts have balances that are larger than the amounts being deducted. Note that the system does not keep track of individual coins—if two different transactions deposit 10 ether each in the same account giving a balance of 20, and then 10 ether is spent, there is no way to know whether this 10 came from the first transaction or the second. + +One important note: in the account model, a transaction of the type "send 10 eth to account x" could be sent multiple times to the blockchain and would result in multiple withdrawls unless we take care to prevent this kind of a double spend. (This is not possible in the UTXO model since one UTXO can only be spent once.) Ethereum solves this problem by maintaining a "nonce" as the part of the state of each account. This nonce is essentially a transaction counter which is incremented with every transaction. When a transaction is submitted to the blockchain, it needs to include this nonce, and the miner ensures that the nonce in the transaction matches the current nonce in the account. This point will become important for scalability later. + +Architectural impact of UTXO vs account model: + +- **Privacy**: The UTXO and account models give different types of privacy. + + In the UTXO model, if a user creates new address every time a UTXO + is spent, it would be difficult to link accounts to each other. This + is something that can be done easily in a UTXO system. Doing + something similar in an account based system would require much more + work. Thus, UTXO model allows more privacy via lower linkability to + transactions. + + On the other hand, in UTXO, following the movement of a particular + set of coins through the system is easy. In an account model, this + can't be done for more than one step since the individual coins + don't have an independent existence. Thus, the account model + provides more privacy via fungibility. + + Depending on the application context, one or the other might be more + desirable. + +- **Scalability**: The UTXO model allows more parallelization of + transactions because two different transactions for the same account + can go ahead independently of each other as long as they use + different UTXOs for inputs. By contrast, the account model, through + the use of the transaction nonce forces serialization of all the + transactions on one account. This increases the effort required to + increase transaction throughput via parallelization and sharding. + + However, the account model has a different scalability advantage + over the UTXO model. The account model only needs to keep track of + the final balance of each account, and not each and every individual + UTXO. As a result the total size of the state that needs to be + maintained by a node is much smaller. This not only reduces the + memory requirement of any one node, but also reduces the time + required for a new node joining the system to sync up with the + current state of the blockchain. + +- **Ease of Programming**: The account model is closer to how people + think in the real world, and as a result it is easier to write + programs and smart contracts for. + +- **Light Clients**: If additional capabilities are layered on top of + the blockchain (for example, the Lightning network on top of + Bitcoin), then the use of UTXO model makes certain kinds of layers + easier while the account model makes other kinds of layers easier to + construct. + + Consider an example from the Lightning Network. Here, Alice and Bob + can maintain a running balance between them by Alice creating + off-chain transactions based on a UTXO and sending the signed + transaction to Bob. Bob submit this transaction to the blockchain. + Instead, when another small payment needs to be made, Alice creates + updated transaction against the same UTXO, consisting of the sum of + the payments so far. This new transaction replaces the previous + transaction. When Bob wants to settle the accounts, he submits just + the last transaction to the blockchain. + + Due to the nonces, this kind of a scheme is not easy to implement in + the account model. + + By contrast, the account model makes it easier to implement light + clients which need to provide services related to just one account. + In case of the account model, it is easy to just fetch the current + state related to that one account without having to fetch the entire + state. (For example, in Ethereum, the Merkle Patricia Tree allows + verifiable fetching of the data of one account by just going down + one path of the tree.) In the UTXO model, to be able to do something + like this effectively, a client would need to watch and analyze all + the transactions on the blockchain. + +To understand these issues in more detail, read the [Horizen article][horizen] (referenced earlier) as well as the [original design rationale][ethrationale] published by Ethereum itself. + +Does a hybrid model make sense? + +It is possible that building a UTXO model at the base, and layering an account model on top of it might give the best of both worlds. The [Horizen article][horizen] points out that [QTUM][qtum] implements such a hybrid model. + + +[horizen]: https://academy.horizen.io/technology/expert/utxo-vs-account-model/ +[ethrationale]: https://eth.wiki/en/fundamentals/design-rationale +[qtum]: https://blog.qtum.org/qtums-utxo-design-decision-d3cb415a3a6e?gi=d822a092f221 diff --git a/docs/ppl-zkp-design.md b/docs/ppl-zkp-design.md new file mode 100644 index 0000000..7ecde1a --- /dev/null +++ b/docs/ppl-zkp-design.md @@ -0,0 +1,121 @@ +# ZKP Designs for Transactions + +-- Navin Kabra + +_Status: v0.02. This is an early, unreviewed draft, and the entire design is subject to change. This version assumes that as far as wallet providers are concerned the "notary" appears as a single/centralized entity. If there is any decentralization of the notary, either for availability or for trust purposes, that is assumed to be hidden behind an API_ + +## UTXO based design + +We assume that a wallet can handle states of different types of UOMs. + + +### Wallet + +A wallet consists of: + +- WalletID +- PublicKey + +(Note: we could replace PublicKey with a list, allowing for multiple owners with 1-of-N or k-of-N semantics. For simplicity, this functionality not described here, but the changes for this would be minimal.) + +### State + +A state consists of: + +- StateID (unique across the system) +- WalletID +- Amount +- UOM +- StateHash + +The StateHash is simply a hash of the StateID+WalletID+Amount+UOM. + +A state represents an `amount` of `UOM` that has been deposited into a wallet. A state can be _active_ or _dropped_. An active state is a UTXO—indicating that the amount represented by it has not yet been used in another transaction, so is a valid candidate to be an input for a future transaction. A dropped state is one which has been used in a transaction, and hence cannot be used again. + +Any transaction consists of taking one or more active states (from one or more wallets), dropping them, and redistributing the total UOM from the input states into one or more newly created output states which will now be active states. + +In the rest of this document we use the words _spent_ and _unspent_ as synonyms for active and dropped states, because it makes the examples easier to understand, but keep in mind that a sleeve can hold non-currency values in which case spent and unspent words would be misleading. + + +### Simple Transaction + +We will go through an example transaction with just one UOM, two inputs and two outputs. + +#### Sender steps + +In this case, the owner creates a transaction core data structure as follows: + +**TxCore** + +- Input states + - SIn1: SIn1ID, SIn1WalletID, SIn1Amount, SIn1UOM, SIn1Hash + - SIn2: SIn2ID, SIn2WalletID, SIn2Amount, SIn2UOM, SIn2Hash +- Output states + - SOut1: SOut1ID, SOut1WalletID, SOut1Amount, SOut1UOM, SInSOut1Hash + - SOut2: SOut2ID, SOut2WalletID, SOut2Amount, SOut2UOM, SInSOut1Hash +- TxCoreHash = Hash(SIn1Hash + SIn2Hash + SOut1Hash + SOut2Hash) + +Note: this transaction core data structure is known and visible only to the owner. At no point are the WalletIDs and Amounts of any of the states revealed to any of the notaries or on the public chain. The details of SOut1 and SOut2 will be sent to the receivers by the sender using some other communication method. + +Based on this, the sender creates a transaction packet consisting of this information: + +**Transaction Packet** + +- TxCoreHash +- SIn1ID, SIn1Hash, TxCoreHash, SIn1Proof +- SIn1ID, SIn1Hash, TxCoreHash, SIn1Proof +- SOut1ID, SOut1Hash +- SOut2Id, SOut2Hash +- TxProof + +Here, SIn1Proof is a zero knowledge proof of the following: + +- SIn1Hash is a valid hash of the contents of SIn1 +- Proof of knowledge of the PrivateKey matching the PublicKey in SIn1Wallet. (Note: this part needs to be fleshed out a bit more.) + +Similarly for SIn2Proof. + +TxProof is a zero knowledge proof of the following: + +- SIn1Amount + SIn2Amount = SOut1Amount + SOut2Amount +- SOut1Amount >= 0 +- SOut2Amount >= 0 +- TxCoreHash = Hash(SIn1Hash + SIn2Hash + SOut1Hash + SOut2Hash) + +(Note: in reality, the first line of TxProof will be more complicated than this because it has to account for the fact that there might be different UOMs involved, and the input totals for each UOM must add up to the output totals for that UOM.) + +The transaction packet created above is sent to the Notary. (Note: the transaction core data structure is not sent to the notary. Only the transaction packet.) + +#### Notary steps + +Notary first verifies that the inputs: by checking that each one represents an unspent state and that the sender has permission to spend this input. Then Notary verifies the integrity of the full transaction by checking that the total of the output amounts matches the total of the input amounts. + +First, Notary confirms that SIn1ID does not appear as an input of any past valid transaction. The system guarantees that if any state is dropped, the corresponding StateID would show up as the input of some transaction that _must_ be sent to Notary and saved in the public ledger before the transaction is considered final. This check guarantees SIn1ID cannot be double spent. + +Next, Notary finds the past transaction that contains SIn1ID as one of its outputs. The system guarantees that one and only one such transaction will exist and it will be present in the public ledger. Let's call this transaction TxIn1. + +Now Notary verifies that SIn1Hash matches the corresponding StateHash in the outputs of TxIn1. Note: this guarantees that SIn1WalletID, SIn1Amount, and SIn1UOM (which Notary does not know) also match. + +Notary repeats this for each of the inputs. + +Finally, Notary verifies the TxProof. + +At this point, Notary is convinced that the transaction is valid, and writes the transaction packet to its public ledger along with its own signature and the transaction is considered complete. + +#### Closing the loop + + +At this point, the transaction initiator sends the details of SOut1 and SOut2 (amounts, WalletIDs, and UOMs) to the receivers using some other communication method. The receivers can check the public ledger and confirm that the SOut1Hash and SOut2Hash values match the expected values, ensuring that they have indeeded received the claimed amounts. + +At no point are the WalletIDs and Amounts of any of the states revealed to any of the notaries or on the public chain. + +### Complex Transaction + +TODO: + +1. Sketch out a transaction in which there are multiple UOMs involved +2. Sketch out a transaction in which the inputs are from different owners + +The basic idea is to think of this transaction as having an initiator who puts together the "deal" (which is essentially an agreement about what inputs are provided by whom and what outputs are sent to whom). Then the initiator secures permission for this deal from all the owners of the input states. Once all the permissions are secured, the initiator then finalizes this transaction on the public ledger. + +The rest of design is still being worked on. diff --git a/notes/todos.md b/notes/todos.md index d210bd0..e862f7e 100644 --- a/notes/todos.md +++ b/notes/todos.md @@ -24,4 +24,39 @@ * Create a journal module to allow clear communication from notary to journal api. * This is where all the future concurrency related stuff will get merged -* \ No newline at end of file + +## ZKP + +- Zokrates implementation of a stablecoin + - Hash function + - Simple transfer (1 input 1 output) + - Multi-transfer (2 inputs 2 outputs) + - Work on "generic" implementation + - PoC for processes to be followed: + - At the time of ledger creation + - At the time of schema creation + - For 2 parties to agree on a ad hoc proof + - Different schemes: + - G16 + - Marlin + - Also different backends: bellman/libsnark/ark + - Different hash functions: + - Pedersen + - Poseidon + - MiMCHash? + - SHA256 + - Performance test suite + - Hashing strings of various lengths + - Proof for basic 2x2 transfer transaction + - Proof for 1xN transfer transaction (n=10,50,100,1000,10000) + - Be able to easily run all the above with different schemes+hashfunctions + - Implement common API +- Circom implementation of a stablecoin + - Similar to Zokrates + - Supported schemes and hash functions will be different +- Common API + - Understand DSL requirements + - Define common API based on Zokrates, Circom, and maybe one more + - Write (failing) tests for the API + + diff --git a/tests/badal/dsl/stableinr_schema.py b/tests/badal/dsl/stableinr_schema.py new file mode 100644 index 0000000..108dbc5 --- /dev/null +++ b/tests/badal/dsl/stableinr_schema.py @@ -0,0 +1,176 @@ +# WARNING: Obsolete. This code is not consistent with the +# latest thinking. See badal/dsl/README.md for the +# latest thinking + + +from ppl.badal.dsl import spec as s + +class PublicId(s.Attribute): + ''' + Automatically generates: + - In badal.schema: Appropriate AttributeType subclass + - In ZKP: struct for this attribute type and methods + to initialize and otherwise manipulate this + attribute + ''' + pass + + +class Amount(s.Attribute): + def __init__(self, uom: str, precision: int = 2, **kwargs): + ... + +class Notes(s.Attribute): + pass + + +class Utxo(s.State): + ''' + Automatically generates: + - In badal.schema: Appropriate StateType subclass + Also, code to automatically convert + from Python native data types to + the correct AttributeType. + Thus: `utxo.amount = 42` should work fine + - In ZKP: struct for this state and functions to manipulate them + function to generate ID of this state? + function to hash this state + ''' + owner_id = PublicId() + amount = Amount(uom='inr', precision=3) + + def __init__(self, + from_id: str|PublicId, + to_id: str|PublicId, + amount: int|Amount): + self.amount = amount + self.from_id = from_id + self.to_id = to_id + + +class AmountsMatchClaim(s.ZokratesClaim): + ''' + Check that the sum of input amounts matches the sum of output amounts + + Each input is a tuple of (param_name, expression) + + The param_name must match one of the parameters of the + function in the code + + The expression must be a valid expression for extracting an + appropriate Zokrates type from one of the states of the transaction + + TODO: N and M: how are they specified/fixed? + Does this automatically fix the size of transaction_content? + ''' + function_name = 'amounts_match' + code=''' + def amounts_match(TxnBinary txn_binary): + u64 input_sum = 0 + for u32 i in 0..N do + input_sum = input_sum + txn_binary.inputs[i].amount.value + endfor + + u64 output_sum = 0 + for u32 j in 0..M do + output_sum = output_sum + txn_binary.outputs[j].amount.value + endfor + + assert(input_sum == output_sum) + ''' + + +class UOMsMatchClaim(s.ZokratesClaim): + '''Checks that all UOMs in the transaction are the same''' + function_name = 'uoms_match' + code=''' + def uoms_match(TxnBinary txn_binary): + u32 first_uom = txn_binary.inputs[0].amount.uom + + for u32 i in 1..N do + assert(first_uom == txn_binary.inputs[i].amount.uom) + endfor + + for u32 j in 0..M do + assert(first_uom == txn_binary.outputs[j].amount.uom) + endfor + ''' + + +class ValidSignaturesClaim(s.ZokratesClaim): + ''' + Check that the signatures are valid, and the necessary ones are there + ''' + @s.ZokratesClaim.input + def input_public_ids(self, + txn: s.Transaction) -> s.ZokratesClaim.Array: + return self.to_array([self.to_field(input.to_id) + for input in txn.inputs]) + + @s.ZokratesClaim.input + def content(self, + txn: s.Transaction) -> s.ZokratesClaim.Array: + '''Convert the core transaction data into an array of fields + + This "content" is what is going to be "signed" by all the + existing owners of the input IOUs + + TODO: How do we handle "size"? + ''' + return self.to_array( + [self.to_field()] + ) + + @s.ZokratesClaim.input + def signatures(self, + txn: s.Transaction) -> s.ZokratesClaim.MultiArray: + return self.to_multi_array( + [self.to_field(sig for sig in txn.signatures)], + shape=(2, len(txn.signatures)) + ) + + code=''' + def valid_signatures(field[N] input_public_ids, + field[?] transaction_content, + field[2][N] signatures): + for u32 i in 0..N do + field[2] sig = ppl_signature(transaction_content, input_public_ids[i]) + ppl_signature_check(sig, signatures[i]) + endfor + ''' + + +class Transfer(s.Transaction): + '''Payment cancelling multiple IOUs and creating multiple new ones''' + claims = (AmountsMatchClaim, UtxoTypesMatchClaim, ValidSignaturesClaim) + + def __init__(self, + inputs: list[Utxo], + outputs: list[Utxo]) -> None: + self.inputs = inputs + self.outputs = outputs + self.signatures: list[s.Signature] = [] + + def sign(self, signatures: list[s.Signature]) -> None: + self.signatures = signatures + # We probably want to check the signatures in python + # first before creating the proof + # That would be far more efficient + + def create_proof(self) -> s.Proof: + return '' + + +class StableINR(s.Spec): + uri = 'http://ispirt.org/stableinr/spec' + name = 'INR StableCoin Spec' + version = '0.1' + contract_model = 'zokrates_one_oh' + signature_type = '' + + attributes = [PublicId, Amount, Notes] + states = [Utxo,] + transactions = [Transfer,] + actions: list[s.Action] = [] + +