Warning: this is an statistical experiment. this code is not suitable for production whatsoever. USE AT YOUR OWN RISK Warning: this code has not been audited or reviewed. USE AT YOUR OWN RISK
A tool to analyze the bandwidth impact of adding transparent transaction data (CompactTxIn and TxOut) to Zcash's compact block protocol.
This analyzer helps evaluate the proposed changes to the lightwallet-protocol by measuring:
- Current compact block sizes from production lightwalletd
- Estimated sizes with transparent input/output data added
- Bandwidth impact on light clients syncing block ranges
The tool fetches real compact blocks from lightwalletd and compares them against estimated sizes calculated from full block data in Zebrad, giving accurate projections of the bandwidth impact.
The Zcash light client protocol currently omits transparent transaction inputs and outputs from compact blocks. PR #1 proposes adding:
CompactTxIn- references to transparent inputs being spentTxOut- transparent outputs being created
This analysis helps decide whether to:
- Make transparent data part of the default
GetBlockRangeRPC - Create a separate opt-in method for clients that need it
- Use pool-based filtering (as implemented in librustzcash PR #1781)
- Rust 1.70+ (install)
- Zebrad - synced Zcash full node with RPC enabled
- Lightwalletd - connected to your Zebrad instance
# Create the project
cargo new compact_block_analyzer
cd compact_block_analyzer
# Create proto directory
mkdir proto# Clone the lightwallet-protocol repository
git clone https://github.com/zcash/lightwallet-protocol.git
cd lightwallet-protocol
# Checkout the PR with transparent data additions
git fetch origin pull/1/head:pr-1
git checkout pr-1
# Copy proto files to your project
cp compact_formats.proto ../compact_block_analyzer/proto/
cp service.proto ../compact_block_analyzer/proto/
cd ../compact_block_analyzerCreate build.rs in the project root:
fn main() -> Result<(), Box<dyn std::error::Error>> {
tonic_build::configure()
.build_server(false) // Client only
.compile(
&["proto/service.proto", "proto/compact_formats.proto"],
&["proto/"],
)?;
Ok(())
}Replace Cargo.toml with the dependencies from the artifact, or copy src/main.rs from the artifact which includes the dependency list.
cargo build --release# If not already running
zebrad startVerify RPC is accessible:
curl -X POST http://127.0.0.1:8232 \
-H "Content-Type: application/json" \
-d '{"method":"getblockcount","params":[],"id":1}'lightwalletd \
--grpc-bind-addr=127.0.0.1:9067 \
--zcash-conf-path=/path/to/zebra.conf \
--log-file=/dev/stdoutVerify lightwalletd is running:
# With grpcurl installed
grpcurl -plaintext localhost:9067 list
# Should show: cash.z.wallet.sdk.rpc.CompactTxStreamerThe tool supports multiple analysis modes:
cargo run --release -- \
http://127.0.0.1:9067 \
http://127.0.0.1:8232 \
quick \
quick_results.csvcargo run --release -- \
http://127.0.0.1:9067 \
http://127.0.0.1:8232 \
recommended \
results.csvThis uses a hybrid sampling strategy:
- 750 samples from each protocol era (pre-Sapling, Sapling, Canopy, NU5)
- 2,000 additional samples from recent blocks (last 100K)
- Provides balanced historical context with focus on current usage
cargo run --release -- \
http://127.0.0.1:9067 \
http://127.0.0.1:8232 \
thorough \
thorough_results.csvcargo run --release -- \
http://127.0.0.1:9067 \
http://127.0.0.1:8232 \
equal \
equal_results.csv# Samples proportionally to era size
cargo run --release -- \
http://127.0.0.1:9067 \
http://127.0.0.1:8232 \
proportional \
proportional_results.csv# Custom weights favoring recent blocks
cargo run --release -- \
http://127.0.0.1:9067 \
http://127.0.0.1:8232 \
weighted \
weighted_results.csvcargo run --release -- \
http://127.0.0.1:9067 \
http://127.0.0.1:8232 \
range 2400000 2401000 \
range_results.csvWhy use sampling? The Zcash blockchain has 2.4M+ blocks. Analyzing every block would take days. Statistical sampling gives accurate results in minutes.
| Strategy | Description | Best For |
|---|---|---|
| Quick | Fast overview with fewer samples | Initial exploration |
| Recommended | Balanced approach with recent focus | Most use cases |
| Thorough | Comprehensive coverage | Final analysis |
| Equal | Same samples per era | Era comparison |
| Proportional | Samples match blockchain distribution | Representing whole chain |
| Weighted | More recent, less historical | Current state focus |
After running the analysis, generate charts and statistics:
# Install Python dependencies
pip install -r requirements.txt
# Generate all visualizations
python visualize.py results.csv --output-dir ./chartsThis creates:
- distribution.png - Histogram and box plot of overhead
- time_series.png - Overhead trends over blockchain height
- by_era.png - Comparison across protocol eras
- correlations.png - Relationship between overhead and transaction characteristics
- cumulative.png - Cumulative distribution functions
- bandwidth_impact.png - Practical bandwidth scenarios
- heatmap.png - Overhead by era and transaction patterns
- statistical_report.txt - Comprehensive statistical analysis
Console output during analysis:
Current blockchain tip: 2450000
Sampling Strategy: HybridRecent
Total samples: 5000
Distribution by era:
pre_sapling: 750 samples (15.0% of total, 1 in 559 blocks)
sapling: 750 samples (15.0% of total, 1 in 836 blocks)
canopy: 750 samples (15.0% of total, 1 in 854 blocks)
nu5: 2750 samples (55.0% of total, 1 in 295 blocks)
Analyzing 5000 blocks...
Progress: 0/5000 (0.0%)
Progress: 500/5000 (10.0%)
...
=== ANALYSIS SUMMARY ===
Blocks analyzed: 5000
Current compact blocks:
Total: 76.23 MB
With transparent data:
Estimated total: 93.45 MB
Delta: +17.22 MB
Overall increase: 22.58%
Per-block statistics:
Median increase: 18.45%
95th percentile: 35.21%
Min: 5.32%
Max: 47.83%
Practical impact:
Current daily sync (~2880 blocks): 43.86 MB
With transparent: 53.75 MB
Additional bandwidth per day: 9.89 MB
Statistical report snippet:
DECISION FRAMEWORK
--------------------------------------------------------------------------------
Median overhead: 18.5%
95th percentile: 35.2%
RECOMMENDATION: LOW IMPACT
The overhead is relatively small (<20%). Consider making transparent
data part of the default GetBlockRange method. This would:
- Simplify the API (single method)
- Provide feature parity with full nodes
- Have minimal bandwidth impact on users
The tool provides real-time progress and summary statistics during analysis.
Detailed per-block analysis in CSV format:
height,era,current_compact_size,estimated_with_transparent,delta_bytes,delta_percent,tx_count,transparent_inputs,transparent_outputs
2400000,nu5,15234,18456,3222,21.15,45,12,89
2400001,nu5,12890,14234,1344,10.43,32,5,43
...The Python script generates comprehensive visualizations:
-
Distribution Analysis
- Histogram with kernel density estimation
- Box plot showing quartiles and outliers
- Marked median and mean values
-
Time Series Analysis
- Overhead percentage over blockchain height
- Absolute size increase over time
- Rolling averages to show trends
- Era boundaries marked
-
Era Comparison
- Box plots comparing distributions across eras
- Violin plots showing density
- Bar charts with standard deviations
- Sample size distribution
-
Correlation Analysis
- Overhead vs transparent inputs
- Overhead vs transparent outputs
- Overhead vs transaction count
- Overhead vs current block size
-
Cumulative Distribution
- CDF of overhead percentages
- CDF of absolute byte increases
- Percentile markers (P50, P75, P90, P95, P99)
-
Bandwidth Impact
- Daily sync bandwidth comparison
- Full chain sync comparison
- Mobile data cost estimates
- Sync time projections
-
Heatmaps
- Overhead by era and transaction count
- Overhead by era and transparent I/O
-
Statistical Report
- Summary statistics with confidence intervals
- Statistics broken down by era
- Practical bandwidth calculations
- Correlation coefficients
- Decision framework recommendations
-
Fetch real compact block from lightwalletd via gRPC
- Gets the actual production compact block size
- Includes all current fields (Sapling outputs, Orchard actions, etc.)
-
Fetch full block from Zebrad via RPC
- Gets transparent input/output data
- Provides transaction details needed for estimation
-
Calculate overhead using protobuf encoding rules
- Estimates size of
CompactTxInmessages (containingOutPoint) - Estimates size of
TxOutmessages (value + scriptPubKey) - Accounts for protobuf field tags, length prefixes, and nested messages
- Estimates size of
-
Compare and report
- Current size vs. estimated size with transparent data
- Per-block and aggregate statistics
The estimator calculates sizes based on the proposed proto definitions:
message OutPoint {
bytes txid = 1; // 32 bytes
uint32 index = 2; // varint
}
message CompactTxIn {
OutPoint prevout = 1;
}
message TxOut {
uint32 value = 1; // varint
bytes scriptPubKey = 2; // variable length
}Added to CompactTx:
repeated CompactTxIn vin = 7;
repeated TxOut vout = 8;The calculation includes:
- Field tags (1 byte per field)
- Length prefixes for bytes and nested messages (varint)
- Actual data sizes
- Nested message overhead
-
< 20% increase: Consider making transparent data default
- Minimal impact on bandwidth
- Simplifies API (single method)
- Better for light client feature parity
-
20-50% increase: Consider separate opt-in method
- Significant but manageable overhead
- Let clients choose based on their needs
- Pool filtering could help (librustzcash PR #1781)
-
> 50% increase: Likely needs separate method
- Major bandwidth impact
- Important for mobile/limited bandwidth users
- Clear opt-in for clients that need transparent data
- Median increase - typical overhead
- 95th percentile - worst-case for active blocks
- Daily bandwidth impact - practical cost for staying synced
- Initial sync impact - multiply by ~2.4M blocks
- Correlation with transparent usage - understand which blocks drive overhead
Port 9067 (lightwalletd):
# Check if running
ps aux | grep lightwalletd
netstat -tlnp | grep 9067
# Test connection
grpcurl -plaintext localhost:9067 listPort 8232 (zebrad):
# Check if running
ps aux | grep zebrad
netstat -tlnp | grep 8232
# Test RPC
curl -X POST http://127.0.0.1:8232 \
-d '{"method":"getblockcount","params":[],"id":1}'Proto compilation fails:
# Ensure proto files exist
ls -la proto/
# Clean and rebuild
cargo clean
cargo build --release"Block not found" errors:
- Check if block height exists on mainnet
- Verify Zebrad is fully synced
- Ensure lightwalletd has indexed the blocks
The tool includes a 100ms delay between blocks to avoid overwhelming your node. For faster analysis:
- Reduce the delay in the code
- Run multiple instances for different ranges
- Use a more powerful machine for Zebrad
This tool is designed for protocol analysis. Contributions welcome:
- Improved size estimation accuracy
- Additional output formats (JSON, charts)
- Statistical analysis enhancements
- Performance optimizations
- lightwallet-protocol PR #1 - Proto definitions
- librustzcash PR #1781 - Pool filtering implementation
- Zcash Protocol Specification - Full protocol details
Same license as the Zcash lightwallet-protocol project.
For questions or issues:
- Open an issue in this repository
- Discuss on Zcash Community Forum
- Zcash R&D Discord
Built to support analysis for improving Zcash light client protocol bandwidth efficiency.