diff --git a/rfcs/rfc-2-evaluation-framework.md b/rfcs/rfc-2-evaluation-framework.md new file mode 100644 index 0000000..b1ee125 --- /dev/null +++ b/rfcs/rfc-2-evaluation-framework.md @@ -0,0 +1,436 @@ +# RFC 2: ProofLab Evaluation Framework + +## Overview + +This RFC proposes a comprehensive framework for evaluating Zero-Knowledge Virtual Machines (zkVMs) within the ProofLab ecosystem. The framework provides standardized metrics and methodologies to assess performance, security, and usability aspects of zkVMs, enabling fair comparisons and informed decisions for users and developers. + +## Motivation + +As the zero-knowledge proof ecosystem rapidly evolves with multiple competing zkVMs, users need objective criteria to evaluate different solutions. This framework aims to: + +1. Establish standardized metrics for comparing zkVM implementations +2. Guide future development priorities for ProofLab +3. Provide users with transparent data to select the most appropriate zkVM for their specific use cases +4. Create benchmarks that can track the progress of zkVM technology over time + +## Specification + +Each metric in this framework is designed to resolve to a discrete numerical value on a scale of 1-10, enabling the creation of radar charts for visual comparison between different zkVMs. This approach facilitates at-a-glance evaluation while maintaining the depth of the underlying assessment. + +### I. Performance Metrics + +These metrics focus on measuring the efficiency and speed of zkVMs. + +#### P1: Proof Generation Time + +- **Description**: Time taken by the zkVM to generate a proof for a given computation +- **Evaluation Method**: + 1. Run a standardized suite of benchmark programs on identical hardware + 2. Measure the average proof generation time across all benchmarks + 3. Calculate score based on relative performance compared to other zkVMs: + - 10: Top 10% performance + - 7-9: Top 30% performance + - 4-6: Middle 40% performance + - 1-3: Bottom 30% performance +- **Required Testing**: Minimum of 5 diverse benchmark programs, run 3 times each, with consistent hardware specifications +- **Implementation**: Automated via ProofLab's benchmarking module, storing results in a structured database + +#### P2: Verification Cost + +- **Description**: Cost (time, gas, or resources) required to verify a generated proof +- **Evaluation Method**: + 1. Measure both on-chain gas costs and off-chain verification time + 2. Normalize measurements against a baseline implementation + 3. Calculate weighted score (60% gas cost, 40% verification time): + - 10: <10% of baseline cost + - 8-9: 10-25% of baseline cost + - 6-7: 25-50% of baseline cost + - 4-5: 50-100% of baseline cost + - 1-3: >100% of baseline cost +- **Required Testing**: Deploy verification contracts on standard test networks, measure gas costs for at least 10 different proofs +- **Implementation**: Automated contract deployment and gas measurement tools integrated with ProofLab's telemetry system + +#### P3: Resource Efficiency + +- **Description**: Efficiency of resource utilization during proof generation +- **Evaluation Method**: + 1. Measure CPU, memory, and disk usage during proof generation + 2. Calculate a composite efficiency score based on: + - CPU efficiency: Instructions per second relative to resource usage + - Memory efficiency: Proof complexity relative to memory usage + - Storage efficiency: Minimizing temporary disk usage + 3. Score on a scale of 1-10 based on resource efficiency percentile: + - 10: 90th percentile efficiency + - 7-9: 70-89th percentile + - 4-6: 40-69th percentile + - 1-3: Below 40th percentile +- **Required Testing**: Detailed profiling during benchmark runs with standardized resource monitoring +- **Implementation**: Resource monitors integrated with ProofLab's benchmarking tools + +#### P4: Proving System Scalability + +- **Description**: How well the zkVM's proving system scales with increasing computation complexity +- **Evaluation Method**: + 1. Measure proof generation time for computations of increasing complexity (linear, polynomial, and exponential complexity classes) + 2. Plot complexity vs. time and determine the scaling factor + 3. Score based on scaling behavior: + - 10: Sub-linear scaling for complex operations + - 7-9: Near-linear scaling + - 4-6: Polynomial scaling with low exponent + - 1-3: Polynomial scaling with high exponent or worse +- **Required Testing**: Run benchmarks with systematic variation in computation complexity +- **Implementation**: Specialized scalability test suite integrated with ProofLab + +#### P5: Proof Size Efficiency + +- **Description**: Size of the generated proofs relative to computation complexity +- **Evaluation Method**: + 1. Measure proof sizes across benchmark suite + 2. Calculate proof size relative to computation complexity + 3. Score based on size efficiency: + - 10: Constant-sized proofs regardless of computation + - 7-9: Near-constant sized proofs with minimal growth + - 4-6: Sub-linear growth with computation complexity + - 1-3: Linear or super-linear growth +- **Required Testing**: Measure proof sizes for all benchmark programs +- **Implementation**: Automatic measurement and recording in benchmark results + +### II. Security Metrics + +These metrics focus on evaluating the security properties of zkVMs and their implementations. + +#### S1: Cryptographic Soundness + +- **Description**: Theoretical security guarantees of the underlying proving system +- **Evaluation Method**: + 1. Evaluate based on cryptographic assumptions and peer-reviewed security proofs + 2. Consider quantum resistance properties + 3. Score based on formal security guarantees: + - 10: Post-quantum secure with formal security proofs + - 7-9: Classically secure with strong formal security proofs + - 4-6: Classically secure with heuristic assumptions + - 1-3: Experimental or unproven cryptography +- **Required Testing**: Analysis of cryptographic literature and formal security proofs +- **Implementation**: Expert review panel assessment stored in ProofLab's evaluation database + +#### S2: Trusted Computing Base (TCB) Size + +- **Description**: Amount of code that must be trusted for the zkVM to function securely +- **Evaluation Method**: + 1. Measure lines of code in the critical security path + 2. Count number and complexity of dependencies + 3. Calculate TCB score inversely proportional to size: + - 10: Minimal TCB (<5K LOC, few dependencies) + - 7-9: Small TCB (5K-20K LOC) + - 4-6: Medium TCB (20K-100K LOC) + - 1-3: Large TCB (>100K LOC, many dependencies) +- **Required Testing**: Static code analysis of zkVM implementation and dependencies +- **Implementation**: Automated code analysis tools integrated with ProofLab + +#### S3: Formal Verification Coverage + +- **Description**: Percentage of security-critical code that has been formally verified +- **Evaluation Method**: + 1. Calculate percentage of security-critical components with formal verification + 2. Weight by the importance of each component + 3. Score based on weighted coverage: + - 10: >90% of critical code formally verified + - 7-9: 70-90% verified + - 4-6: 30-70% verified + - 1-3: <30% verified +- **Required Testing**: Analysis of formal verification artifacts +- **Implementation**: Integration with formal verification reporting tools + +#### S4: Security Testing Maturity + +- **Description**: Comprehensive assessment of security testing practices +- **Evaluation Method**: + 1. Evaluate fuzzing and symbolic execution coverage + 2. Assess test quality and coverage of edge cases + 3. Calculate score based on: + - Fuzzing coverage percentage + - Number and diversity of security tests + - External security audits + 4. Score based on overall testing maturity: + - 10: Comprehensive testing with >90% coverage and multiple audits + - 7-9: Strong testing with 70-90% coverage + - 4-6: Basic testing with 40-70% coverage + - 1-3: Limited testing with <40% coverage +- **Required Testing**: Review of test suites and execution of coverage analysis +- **Implementation**: Integration with code coverage and fuzzing tools + +#### S5: Vulnerability Management + +- **Description**: Quality of the vulnerability management process +- **Evaluation Method**: + 1. Analyze historical vulnerability data + 2. Assess response time for critical vulnerabilities + 3. Evaluate transparency of security processes + 4. Score based on composite assessment: + - 10: Proactive security with rapid responses (<1 week) and full transparency + - 7-9: Strong security processes with good response times + - 4-6: Adequate vulnerability management + - 1-3: Poor or unproven vulnerability management +- **Required Testing**: Historical analysis of security bulletins and responsiveness +- **Implementation**: Security incident database integrated with ProofLab evaluation framework + +### III. Usability Metrics + +These metrics consider the developer and user experience of zkVMs. + +#### U1: Development Complexity + +- **Description**: Effort required to develop applications for the zkVM +- **Evaluation Method**: + 1. Measure lines of code required for standard operations + 2. Evaluate learning curve through structured developer experiments + 3. Score based on complexity relative to normal development: + - 10: Minimal additional complexity over standard development + - 7-9: Moderate learning curve with good developer experience + - 4-6: Significant learning curve but manageable + - 1-3: Complex development process requiring specialized knowledge +- **Required Testing**: Standardized developer tasks with measured time-to-completion +- **Implementation**: Developer experience labs integrated with ProofLab + +#### U2: Tooling Maturity + +- **Description**: Quality and completeness of development tools +- **Evaluation Method**: + 1. Inventory available tools (compilers, debuggers, IDEs, etc.) + 2. Assess tool quality through structured evaluation + 3. Score based on tooling completeness: + - 10: Full suite of mature tools with excellent integration + - 7-9: Good tooling with minor gaps + - 4-6: Basic tooling covering essential needs + - 1-3: Limited or experimental tooling +- **Required Testing**: Catalog and testing of available tools +- **Implementation**: Tool inventory and quality assessment integrated with ProofLab + +#### U3: Language and Library Support + +- **Description**: Breadth and depth of programming language and library support +- **Evaluation Method**: + 1. Count supported languages and evaluate implementation quality + 2. Assess availability of standard libraries and cryptographic primitives + 3. Score based on support breadth and depth: + - 10: Multiple languages with full-featured libraries + - 7-9: Good support for common languages and libraries + - 4-6: Limited language support with basic libraries + - 1-3: Single language with minimal library support +- **Required Testing**: Catalog of supported languages and libraries with quality assessment +- **Implementation**: Language and library database maintained within ProofLab + +#### U4: Documentation Quality + +- **Description**: Comprehensiveness and clarity of documentation +- **Evaluation Method**: + 1. Assess documentation completeness, accuracy, and organization + 2. Evaluate availability of tutorials, examples, and reference materials + 3. Score based on documentation quality: + - 10: Comprehensive, well-organized documentation with excellent examples + - 7-9: Good documentation with minor gaps + - 4-6: Adequate documentation covering core functionality + - 1-3: Sparse or outdated documentation +- **Required Testing**: Documentation review by multiple evaluators +- **Implementation**: Documentation quality assessment rubric integrated with ProofLab + +#### U5: Ecosystem Vibrancy + +- **Description**: Health and activity of the zkVM's developer ecosystem +- **Evaluation Method**: + 1. Measure community size, activity, and contributions + 2. Assess availability of third-party tools and libraries + 3. Evaluate commercial adoption and support + 4. Score based on ecosystem health: + - 10: Large, active ecosystem with commercial support + - 7-9: Growing ecosystem with good community engagement + - 4-6: Developing ecosystem with moderate activity + - 1-3: Small or inactive ecosystem +- **Required Testing**: Community metrics gathering and analysis +- **Implementation**: Automated community health monitoring integrated with ProofLab + +## Radar Chart Visualization + +The framework is designed to produce a comprehensive radar chart with 15 axes (5 each for Performance, Security, and Usability), each with a score from 1-10. This visualization enables at-a-glance comparison between different zkVMs while preserving the detailed assessment behind each score. + +### Sample Radar Chart Structure + +``` + P1 + │ + │ + P5 ───┼─── P2 + /│\ + / │ \ + / │ \ + / │ \ + / │ \ + / │ \ + P4 ─────┼───── P3 + /│\ │ /│\ + / │ \ │ / │ \ + / │ \ │ / │ \ + S5───┼───S1 │ U5───┼───U1 + \ │ / │ \ │ / + \ │ / │ \ │ / + \│/ │ \│/ + S4 ─────┼───── U4 + \ │ / + \ │ / + \ │ / + \ │ / + \│/ + S3 ───┼─── U3 + │ + │ + S2───────U2 +``` + +This structured evaluation approach enables zkVMs to be compared objectively across multiple dimensions, with each metric having a clear, reproducible evaluation methodology. + +## Integration with ProofLab + +The evaluation framework will be integrated with ProofLab through several key components, enabling automated assessment and visualization of zkVM characteristics. + +### 1. Benchmarking Module in ProofLab Core + +ProofLab will include a dedicated `benchmarking` module that implements standardized tests based on the metrics defined in this framework. This module will directly interface with ProofLab's existing zkVM backends and telemetry systems. + +**Integration Points:** +- **`src/benchmarking/mod.rs`**: Main module defining the benchmark suite architecture +- **`src/benchmarking/metrics.rs`**: Implementation of specific metrics collection +- **`src/benchmarking/suite.rs`**: Standard benchmark programs and execution engine +- **`src/benchmarking/reporting.rs`**: Results processing and visualization + +**Features:** +- Consistent test suite executed across all supported zkVMs (RISC0, SP1, etc.) +- Automated collection of all 15 metrics defined in this framework +- Cross-zkVM comparison with normalization of results +- Historical tracking of performance changes + +### 2. Telemetry System Extensions + +The existing telemetry system in ProofLab (`src/telemetry.rs`) will be extended to capture the detailed metrics required by this framework. + +**Integration Points:** +- **`src/telemetry.rs`**: Enhanced to collect and report all performance metrics +- **`src/risc0.rs` and `src/sp1.rs`**: Instrumented to expose internal performance data +- **`prooflab_io/src/lib.rs`**: Extended to facilitate consistent resource monitoring + +**Enhancements:** +- Fine-grained timing of all proof generation phases +- Memory and CPU profiling during execution +- Storage utilization tracking +- Network traffic monitoring when applicable +- Structured metrics output in standardized JSON format + +### 3. Security Assessment Integration + +A new security assessment module will be added to evaluate and report on security aspects of zkVMs. + +**Integration Points:** +- **`src/security/mod.rs`**: New module for security evaluation +- **`src/security/tcb.rs`**: TCB size and dependency analysis +- **`src/security/verification.rs`**: Formal verification assessment +- **`src/security/testing.rs`**: Security testing metrics collection + +**Capabilities:** +- Static analysis of zkVM codebases to measure TCB size +- Integration with fuzzing and symbolic execution results +- Security metrics collection and normalization +- Vulnerability history database integration + +### 4. Metric Calculation Engine + +A core component that transforms raw measurements into standardized 1-10 scores for the radar chart. + +**Integration Points:** +- **`src/evaluation/mod.rs`**: New module for metric processing +- **`src/evaluation/normalization.rs`**: Algorithms for converting raw data to scores +- **`src/evaluation/radar.rs`**: Radar chart generation logic + +**Process:** +1. Collect raw measurements for each metric from benchmarking runs +2. Apply metric-specific normalization algorithms (detailed in the Specification section) +3. Generate final 1-10 scores for each metric +4. Store results in a structured format for visualization + +### 5. Command-Line Interface Extensions + +ProofLab's CLI will be enhanced with commands specifically for running evaluations and displaying results. + +**Integration Points:** +- **`src/main.rs`**: New command-line options for evaluation +- **`src/cli/evaluate.rs`**: Implementation of evaluation subcommands + +**New Commands:** +- `prooflab evaluate `: Run complete evaluation on a specific zkVM +- `prooflab compare `: Compare two zkVMs with radar chart +- `prooflab benchmark `: Run specific benchmark across all zkVMs +- `prooflab report`: Generate comprehensive evaluation report + +### 6. Visualization and Reporting + +The framework will include built-in visualization capabilities for radar charts and detailed reports. + +**Integration Points:** +- **`src/reporting/mod.rs`**: New module for report generation +- **`src/reporting/charts.rs`**: Radar chart and other visualization logic +- **`src/reporting/templates/`**: Report templates directory + +**Output Formats:** +- Terminal-based radar charts for CLI usage +- HTML reports with interactive visualizations +- JSON data export for integration with other tools +- PDF reports for formal documentation + +### 7. Configuration System + +A flexible configuration system to allow customization of the evaluation framework. + +**Integration Points:** +- **`src/config/evaluation.rs`**: Evaluation framework configuration +- **`src/config/benchmarks.rs`**: Benchmark suite configuration + +**Customizable Parameters:** +- Benchmark selection and weighting +- Hardware specification normalization +- Metric thresholds and scoring parameters +- Reporting format preferences + +## Benefits + +Integrating this evaluation framework into ProofLab will provide several benefits: + +1. **Standardization**: Establish common metrics for evaluating zkVMs with discrete numerical values +2. **Transparency**: Provide users with objective data for decision-making through visualizations +3. **Guidance**: Help developers focus on improving specific aspects of zkVMs based on detailed scores +4. **Progress Tracking**: Measure and celebrate advancements in zkVM technology over time +5. **Ecosystem Support**: Encourage healthy competition among zkVM implementations with fair comparisons + +## Implementation Timeline + +The implementation of this framework is proposed in three phases: + +1. **Phase 1 (1-2 months)**: + - Implement core benchmarking infrastructure + - Develop Performance metrics (P1-P5) + - Create basic radar chart visualization + +2. **Phase 2 (2-3 months)**: + - Add Security metrics (S1-S5) + - Develop security assessment tools + - Enhance telemetry system for detailed metrics + - Improve visualization with comparative features + +3. **Phase 3 (3-4 months)**: + - Implement Usability metrics (U1-U5) + - Develop comprehensive reporting system + - Create public dashboards for ecosystem comparison + - Release documentation and integration guides + +## Conclusion + +This RFC proposes a comprehensive evaluation framework for zkVMs that resolves each assessment criterion into discrete numerical values (1-10) suitable for radar chart visualization. By implementing this framework, ProofLab will provide objective, comparable metrics across all dimensions of zkVM performance, security, and usability. + +The integration with ProofLab's existing architecture ensures that this evaluation framework will become a core feature of the ecosystem, providing valuable insights to developers, researchers, and users of zero-knowledge proof technology. \ No newline at end of file