Skip to content

ACL: Adversarial Contrastive Learning for LLM Quantization Attacks.

License

Notifications You must be signed in to change notification settings

dinghongsong/ACL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ACL: Adversarial Contrastive Learning for LLM Quantization Attacks.

ACL: Adversarial Contrastive Learning for LLM Quantization Attacks.

Setting

Add the following variables to ~/.bashrc

vi ~/.bashrc
export HF_TOKEN=<YOUR TOKEN>
export HF_ALLOW_CODE_EVAL=1
source ~/.bashrc

Environment setting

conda create -n ACL python=3.11 -y
conda activate ACL
pip install -r requirements.txt

Quick Start

Download Model

 cd ACL
 hf download "meta-llama/Llama-3.2-1B-Instruct" --local-dir  base_models/llama3.2-1b-instruct

Fine-tune

Two-stage fine-tuning (injection and removal).

./run_injection_and_removal.sh llama3.2-1b-instruct

If you have 8 GPUs, you can perform distributed fine-tuning by running

./run_injection_and_removal_8gpu.sh llama3.2-1b-instruct

Evaluate Attack Success Rate (ASR):

Evaluate ASR under three attack scenarios (ad_inject, over_refusal, and jailbreak) across three zero-shot LLM quantization settings: INT8, FP4, and NF4.

./run_evaluate_asr.sh llama3.2-1b-instruct ad_inject fp4

Evaluate Benchmark:

Evaluate MMLU and TruthfulQA under three attack scenarios (ad_inject, over_refusal, and jailbreak) across three zero-shot LLM quantization settings: INT8, FP4, and NF4.

./run_evaluate_benchmark.sh llama3.2-1b-instruct ad_inject fp4

Acknowledgements

Our code is based on llm-quantization-attack and llm-pruning-attack.

We thank the teams for their open-source implementation.

Citation

If you find AttnCache useful or relevant to your project and research, please kindly cite our paper:

@article{song2026acl,
        title={Adversarial Contrastive Learning for LLM Quantization Attacks},
        author={Song, Dinghong and Xu, Zhiwei and Wan, Hai and Zhao, Xibin and Su, Pengfei and Li, Dong},
        journal={arXiv},
        year={2026}
}

About

ACL: Adversarial Contrastive Learning for LLM Quantization Attacks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published