Source code for EMNLP 2024 Findings paper: Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models.
- pretrain target model
- pretrain shadow model
- pretrain calibrate model
bash ./script/pretrain_{target_model}.sh- target model loss
- shadow model loss
bash ./script/sequence_feature_{target_model}.shTrain on target model training data, and test on target model testing data.
Train on shadow model training data, and test on target model testing data.
bash ./script/{target_model}_mia.sh