Skip to content

Source code for EMNLP 2024 Findings paper: Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models.

License

Notifications You must be signed in to change notification settings

KDEGroup/Buzzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Buzzer

Source code for EMNLP 2024 Findings paper: Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models.

Step 1: pretrain model

  • pretrain target model
  • pretrain shadow model
  • pretrain calibrate model
bash ./script/pretrain_{target_model}.sh

Step 2: extract model loss

  • target model loss
  • shadow model loss
bash ./script/sequence_feature_{target_model}.sh

Step 3: classification

white-box inference

Train on target model training data, and test on target model testing data.

black-box inference

Train on shadow model training data, and test on target model testing data.

bash ./script/{target_model}_mia.sh

About

Source code for EMNLP 2024 Findings paper: Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •