Skip to content

Federated-Learning-MLC/PrivacyBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PrivacyBench CLI Framework

A production-ready CLI framework for privacy-preserving machine learning benchmarking. Converts notebook-based experiments into a modular, reproducible command-line interface.

πŸš€ Quick Start

Installation

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install PrivacyBench in development mode
pip install -e .

# Verify installation
privacybench --help

Basic Usage

# List available experiments, datasets, models
privacybench list all

# Run CNN baseline experiment
privacybench run --experiment cnn_baseline --dataset alzheimer

# Run federated learning experiment  
privacybench run --experiment fl_cnn --dataset alzheimer

# Validate configuration file
privacybench validate --config configs/experiments/baselines/cnn_alzheimer.yaml

# Dry run (validate without execution)
privacybench run --experiment vit_baseline --dataset skin_lesions --dry-run

πŸ“‹ Available Commands

privacybench run

Execute privacy-preserving ML experiments

Options:

  • --experiment: Experiment type (cnn_baseline, vit_baseline, fl_cnn, dp_cnn, etc.)
  • --dataset: Dataset to use (alzheimer, skin_lesions)
  • --config: Custom YAML configuration file
  • --output: Output directory for results (default: ./results)
  • --dry-run: Validate configuration without execution
  • --seed: Random seed for reproducibility (default: 42)

Examples:

privacybench run --experiment cnn_baseline --dataset alzheimer
privacybench run --experiment fl_dp_cnn --dataset skin_lesions --output ./my_results
privacybench run --config experiments/custom_config.yaml

privacybench list

Display available components

Options:

  • experiments: List available experiments
  • datasets: List available datasets
  • models: List available model architectures
  • privacy: List available privacy techniques
  • all: List everything (default)

Examples:

privacybench list experiments
privacybench list datasets
privacybench list all

privacybench validate

Validate experiment configurations

Options:

  • --config: YAML configuration file to validate (required)
  • --verbose: Show detailed validation information

Examples:

privacybench validate --config configs/experiments/baselines/cnn_alzheimer.yaml
privacybench validate --config my_experiment.yaml --verbose

🎯 Supported Experiments

Experiment Model Privacy Dataset Support Expected Accuracy
cnn_baseline ResNet18 None alzheimer, skin_lesions 97.9% (alzheimer)
vit_baseline ViT-Base/16 None alzheimer, skin_lesions 99.0% (alzheimer)
fl_cnn ResNet18 Federated Learning alzheimer, skin_lesions ~98.0% (alzheimer)
fl_vit ViT-Base/16 Federated Learning alzheimer, skin_lesions TBD
dp_cnn ResNet18 Differential Privacy alzheimer, skin_lesions TBD
dp_vit ViT-Base/16 Differential Privacy alzheimer, skin_lesions TBD
fl_dp_cnn ResNet18 FL + DP alzheimer, skin_lesions TBD
smpc_cnn ResNet18 SMPC alzheimer, skin_lesions TBD

πŸ“Š Datasets

Alzheimer MRI Classification

  • Classes: 4 (NonDemented, VeryMildDemented, MildDemented, ModerateDemented)
  • Size: ~6,400 images
  • Type: Medical imaging
  • Usage: --dataset alzheimer

ISIC Skin Lesion Classification

  • Classes: 8 skin lesion types
  • Size: ~10,000 images
  • Type: Medical imaging
  • Usage: --dataset skin_lesions

πŸ”’ Privacy Techniques

Baseline (No Privacy)

  • Standard training without privacy constraints
  • Fastest training, best accuracy
  • Use: cnn_baseline, vit_baseline

Federated Learning (FL)

  • Distributed training without sharing raw data
  • Moderate privacy, good utility
  • Use: fl_cnn, fl_vit

Differential Privacy (DP)

  • Mathematical privacy guarantees via noise injection
  • Strong privacy, reduced utility
  • Use: dp_cnn, dp_vit

Secure Multi-Party Computation (SMPC)

  • Cryptographic secure aggregation
  • Very strong privacy, significant overhead
  • Use: smpc_cnn, smpc_vit

Hybrid Combinations

  • FL + DP: Combined federated training with differential privacy
  • FL + SMPC: Federated training with cryptographic security
  • Use: fl_dp_cnn, fl_smpc_cnn

πŸ“ Project Structure

privacybench/
β”œβ”€β”€ cli/                    # CLI interface and commands
β”œβ”€β”€ core/                   # Component registry and wrappers (Phase 2)
β”œβ”€β”€ execution/              # Experiment execution engine (Phase 3)
β”œβ”€β”€ output/                 # Results collection and export (Phase 4)
β”œβ”€β”€ utils/                  # Utilities and helpers
β”œβ”€β”€ legacy/                 # Preserved existing code
β”œβ”€β”€ configs/                # YAML configuration files
β”œβ”€β”€ tests/                  # Test suite (Phase 5)
└── examples/               # Usage examples

βš™οΈ Configuration

YAML Configuration Format

metadata:
  name: "cnn_baseline_alzheimer"
  experiment_type: "cnn_baseline"
  dataset: "alzheimer"

dataset:
  name: "alzheimer"
  config:
    augmentation: true
    test_split: 0.08
    validation_split: 0.1

model:
  architecture: "cnn"
  config:
    pretrained: true
    num_classes: 4
    dropout: 0.1

privacy:
  techniques: []  # No privacy for baseline

training:
  epochs: 50
  batch_size: 32
  learning_rate: 0.0002
  optimizer: "adam"
  tolerance: 7

output:
  directory: "./results"
  save_model: true
  export_formats: ["json", "csv"]

Privacy Configuration Examples

Federated Learning:

privacy:
  techniques:
    - name: "federated_learning"
      config:
        num_clients: 3
        num_rounds: 5
        strategy: "FedAvg"

Differential Privacy:

privacy:
  techniques:
    - name: "differential_privacy"
      config:
        epsilon: 1.0
        delta: 1e-5
        noise_multiplier: 1.0
        max_grad_norm: 1.0

Combined FL + DP:

privacy:
  techniques:
    - name: "federated_learning"
      config:
        num_clients: 3
        num_rounds: 5
    - name: "differential_privacy"
      config:
        epsilon: 1.0
        delta: 1e-5

πŸ“ˆ Output and Results

CLI Output

πŸŽ‰ EXPERIMENT COMPLETED SUCCESSFULLY
═══════════════════════════════════════
πŸ“‹ Experiment: cnn_baseline_alzheimer
πŸ“Š Dataset: alzheimer
🧠 Model: cnn  
πŸ”’ Privacy: None (Baseline)
⏱️ Duration: 588.0 seconds

πŸ“ˆ PERFORMANCE METRICS:
 β€’ Accuracy: 97.90%
 β€’ F1 Score: 0.9785
 β€’ ROC AUC: 0.9958

⚑ RESOURCE CONSUMPTION:
 β€’ Training Time: 588.0 seconds
 β€’ Peak GPU Memory: 1.20 GB
 β€’ Energy Consumed: 0.026000 kWh
 β€’ CO2 Emissions: 0.011830 kg

πŸ“ Results saved to: ./results/exp_20250131_123456

File Outputs

results/exp_20250131_123456/
β”œβ”€β”€ results.json           # Complete results
β”œβ”€β”€ metrics.csv            # Key metrics table  
β”œβ”€β”€ summary.md             # Human-readable summary
β”œβ”€β”€ config.yaml            # Experiment configuration
└── emissions.csv          # CodeCarbon energy data

πŸ”¬ Development Status

Phase 1 (Current): βœ… CLI Foundation & Configuration

  • CLI entry point and command structure
  • YAML configuration parsing and validation
  • Integration with existing experiments.yaml
  • Command: privacybench list, privacybench validate, privacybench run --dry-run

Phase 2 (Next): πŸ”§ Component System & Wrappers

  • Component registry for datasets, models, privacy techniques
  • Wrapper implementations around existing code
  • Command: Full privacybench run without execution

Phase 3 (Future): πŸš€ Execution Engine

  • Complete experiment orchestration
  • Integration with existing training code
  • Command: Full experiment execution

Phase 4 (Future): πŸ“Š Results & Export System

  • Results collection and formatting
  • Multiple export formats and comparisons

Phase 5 (Future): πŸ§ͺ Testing & Production Polish

  • Comprehensive test suite
  • Production-ready packaging

🀝 Contributing

Development Setup

# Clone and install in development mode
git clone <repository>
cd privacybench
pip install -e .

# Run tests (Phase 5)
pytest tests/

# Validate installation
privacybench --help
privacybench list all

Adding New Experiments

  1. Add experiment to legacy/experiments.yaml
  2. Map CLI name in cli/parser.py
  3. Add to choices in cli/main.py
  4. Test: privacybench run --experiment new_experiment --dry-run

Adding New Datasets

  1. Create wrapper in core/datasets/ (Phase 2)
  2. Register in component registry
  3. Add validation rules
  4. Test with existing experiments

πŸ“– Documentation

  • Quick Start: This README
  • API Reference: Coming in Phase 5
  • Configuration Guide: examples/ directory
  • Development Guide: Coming in Phase 5

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

Built on top of existing PrivacyBench research codebase with:

  • PyTorch Lightning for training infrastructure
  • Flower for federated learning
  • Opacus for differential privacy
  • CodeCarbon for energy tracking
  • Weights & Biases for experiment logging

Note: This is Phase 1 implementation focused on CLI foundation and configuration. Actual experiment execution will be added in Phase 3.

About

Privacy benchmark suite for real-world trade-offs in PPML

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published