-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Hello. I hope you're doing fine!
One of the main difficulties that crosses me while delving at breaching it is the lack of a precise documentation.
This is by no mean a critique, as I am very aware that every project depends on the work of multiple parties, so it doesn't diminish your work at all.
But with that statement, I want to somehow open a possibility for an improvement, that I do want to take care of.
The main problem that I feel is it that the information is scattered along so many files and READMEs.
It takes time to pull all together and starts to see how the things work with each other.
For what I have been reading about Hydra, this is a thing that the framework forces by its architecture and the way config groups/packages work.
But, along those delves, I have come across the following documentation: https://hydra.cc/docs/configure_hydra/app_help/
So it is possible with Hydra itself propose a more or less centralized information to the user.
There are also a lot of helpfull commands that I think in the future could be aggregated over the README's,
but my main objective with this issue is to try to propose a way to document and index the massive amount of parameters that breaching implements.
Based on all this, I have created a prototype branch to propose a way to this with application help:
main...RageAgainstTheMachineLearning:breaching:chore/hydra-app-help
You can peak at the code and checkout the branch to see the workings, and also, I'll be describing it here as well.
I did create a custom omegaconf resolver and inserted it at init.py, this resolver is able to read files that can be consumed by hydra yamls. With that, I can insert information on the readmes into the help config.
With that, we would be able to cover two use cases:
- The user has not already cloned the repo, and is just peaking the code over the web, so the README's still usefull point of info.
- The user has already cloned the repo, and installed it dependencies, and now can use the --help flag to see a full, centralized documentation.
So help.yaml is consuming some readmes that I found interesting, and the one I created, impl/README, to just produce a prototype/suggestion of what could be done.
The result then is this:
$python simulate_breach.py --help
# Breaching
This is the breaching package. You'll find attack implementations under `breaching.attacks`, use case implementations under `breaching.cases`, metrics under `breaching.analysis` and the configurations for everything under `breaching.config`. Several helper files are implemented under `breaching.utils`.
More details can be found in each module.
== Configuration groups ==
Compose your configuration from those groups:
attack: _default_optimization_attack, analytic, april_analytic, beyondinfering, clsattack, decepticon, deepleakage, imprint, invertinggradients, legacy, modern, multiscale_ghiasi, rgap, sanitycheck, seethroughgradients, tag, wei
case: 0_sanity_check, 10_causal_lang_training, 1_single_image_small, 2_single_imagenet, 4_fedavg_small_scale, 5_small_batch_imagenet, 6_large_batch_cifar, 8_industry_scale_fl, 9_bert_training
case/data: Birdsnap, CIFAR10, CIFAR100, ImageNet, ImageNetAnimals, TinyImageNet, cola, random-tokens, shakespeare, stackoverflow, wikitext
case/data/db: LMDB, none
case/impl: default
case/server: honest-but-curious, malicious-fishing, malicious-model-cah, malicious-model-rtf, malicious-transformer
case/user: local_gradient, local_updates, multiuser_aggregate
# Configuration
This is a `hydra-core` configuration folder. There is no "full" `.yaml` file for each configuration as the full configuration is assembled from the folder structure shown here. Any parameter can be overwritten or a new parameter added at runtime using the `hydra` syntax.
**Caveat:** Overriding a whole group of options (for example when choosing a different dataset) requires the syntax `case/data=CIFAR10`!
Using only `case.data=CIFAR10` will only override the name of the dataset and does not include the full group of configurations.
== Attack group options ==
TODO
== Case group options ==
TODO
== Case/Data group options ==
TODO
== Case/Implementation group options ==
# Dataloader implementation choices:
# Turn these off for better reproducibility of experiments
- shuffle: @boolean DEFAULT=FALSE
- Samples elements randomly. If without replacement, then sample from a shuffled dataset.
- sample_with_replacement: @boolean DEFAULT=False
- Used along with shuffle. Samples are drawn on-demand with replacement if True
# PyTorch configuration
- dtype: @torch.dtype DEFAULT=float
- This has to be float when mixed_precision is True. A torch.dtype is an object that represents the data type of a torch.Tensor. PyTorch has several different data types, refer to https://docs.pytorch.org/docs/stable/tensor_attributes.html for more info.
- non_blocking: @boolean DEFAULT=True
- Dosent seens to be in use?
- sharing_strategy: @string DEFAULT=file_descriptor
- Defines the strategy of torch.multiprocessing to provide shared views on the same data in diferent processes. Refer to https://docs.pytorch.org/docs/stable/multiprocessing.html#sharing-strategies for more info.
- enable_gpu_acc: @boolean DEFAULT=FALSE
- Uses CUDA as torch device instead of CPU.
- benchmark: @boolean DEFAULT=True
- Causes cuDNN to benchmark multiple convolution algorithms and select the fastest.
- deterministic: @boolean DEFAULT=False
- This option will disable cuDNN non-deterministic ops.
- pin_memory: @boolean DEFAULT=True
- threads: @int DEFAULT=0
- Maximal number of cpu dataloader workers used per GPU
- persistent_workers: @boolean DEFAULT=False
- mixed_precision: @boolean DEFAULT=False
- grad_scaling: @boolean DEFAULT=True
- This is a no-op if mixed-precision is off
- JIT: "script"|"trace"|null DEFAULT=null
- script currently break autocast mixed precision
- trace breaks training
- validate_every_nth_step: @int DEFAULT=10
- checkpoint: @object
checkpont.name: @string
- checkpoint.save_every_nth_step: @int DEFAULT=10
- enable_huggingface_offline_mode: @boolean DEFAULT=True
- huggingface` needs an internet connection for metrics, datasets and tokenizers. After caching these objects, it can be turned to offline mode with this argument.
== Case/Server group options ==
TODO
== Case/User group options ==
TODO
==
Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help
This is a proposal, and as well, a way to see if you find that something like this is interessant and can be valuable to the project.
So what do you think?
Please, let me know!
If you have other way to do this, or even if you don't find a good thing at all, let me know!
I am willing to do this, or any other way of documentation that you suggest, or move on if it is not desired.
Thank you!