This repository hosts the code for PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization by Sanae Lotfi*, Marc Finzi*, Sanyam Kapoor*, Andres Potapczynski*, Micah Goldblum, and Andrew Gordon Wilson.
conda env create -f environment.yml -n pactlSetup the pactl package.
pip install -e .We use Fire for CLI parsing.
python experiments/train.py --dataset=cifar10 \
--model-name=resnet18k \
--base-width=64 \
--optimizer=adam \
--epochs=500 \
--lr=1e-3 \
--intrinsic_dim=1000 \
--intrinsic_mode=rdkronqr \
--seed=137All arguments in the main method of experiments/train.py
are valid CLI arguments. The most imporant ones are noted here:
--seed: Setting the seed is important so that any subsequent runs using the checkpoint can reconstruct the same random parameter projection matrices used during training.--data_dir: Parent path to directory containing root directory of the dataset.--dataset: Dataset name. See data.py for list of dataset strings.--intrinsic_dim: Dimension of the training subspace of parameters.--intrinsic_mode: Method used to generate (sparse) random projection matrices. Seecreate_intrinsic_modelmethod in projectors.py for a list of valid modes.
Distributed training is helpful for large datasets like Imagenet to spread computation over multiple GPUs. We rely on torchrun.
To use multiple GPUs on a single node, we need:
- GPU visibility flags appropriately via
CUDA_VISIBLE_DEVICES. - Specify the number
xof GPUs made visible via--nproc_per_node=<x> - Specify a random port
yyyyon the host for inter-process communication via--rdzv_endpoint=localhost:yyyy.
For the same run as above, we simply replace python with torchrun as:
CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --rdzv_endpoint=localhost:9999 experiments/train.py ...All remaining CLI arguments remain unchanged.
The key argument needed for transfer is the path to the configuration file named net.cfg.yml of the pretrained network.
python experiments/train.py --dataset=fmnist \
--optimizer=adam \
--epochs=500 \
--lr=1e-3 \
--intrinsic_dim=1000 \
--intrinsic_mode=rdkronqr \
--prenet_cfg_path=<path/to/net.cfg.yml> \
--seed=137 \
--transferIn addition to earlier arguments, there is only one new key argument:
--prenet_cfg_path: Path tonet.cfg.ymlconfiguration file of the pretrained network. This path is logged during the train command specified previously.
Data-dependent bounds first require pre-training on a fixed subset of training data and then training an intrinsic dimensionality model on the remainder of the subset.
For such training, we use the following command:
python experiments/train_dd_priors.py --dataset=cifar10 \
...
--indices_path=<path/to/index/list> \
--train-subset=0.1 \
--seed=137The key new arguments here in addition to the ones seen previously are:
--indices-path: A fixed permutation of indices as a numpy list equal to the length of the dataset. If not specified, a random permutation is generated every time and the results may not be reproducible. See dataset_permutations.ipynb to see an example of how to generate such a file.--train-subset: A fractional subset of the training data to use. If a negative fraction, then the complement is used.
Once we have the checkpoints of intrinsic-dimensionality models, the bound can be computed using:
python experiments/compute_bound.py --dataset=mnist \
--misc-extra-bits=7 \
--quant-epochs=30 \
--levels=50 \
--lr=0.0001 \
--prenet_cfg_path=<path/to/net.cfg.yml> \
--use_kmeans=TrueThe key arguments here are:
--misc-extra-bit: Penalty for hyper-parameter optimization during bound computation, equals the bits required to encode all hyper-parameter configurations.--levels: Number of quantization levels.--quant-epochs: Number of epochs used for fine-tuning of quantization levels.--lr: Learning rate used for fine-tuning of quantization levels.--user_kmeans: When true, uses kMeans clustering for initialization of quantization levels. Otherwise, random initialization is used.
Apache 2.0