-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Get the coefficient: 0.607680082321167
[rank0]:W1030 09:49:19.582331 2310 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
[rank0]:W1030 09:49:19.582331 2310 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[2025-10-30 09:50:46,778] [INFO] [launch.py:335:sigkill_handler] Killing subprocess 2310
[2025-10-30 09:50:46,778] [ERROR] [launch.py:341:sigkill_handler] ['/root/miniconda3/envs/xzy_bitkd/bin/python3.10', '-u', 'train.py', '--local_rank=0', '--model_name_or_path', '/root/autodl-tmp/model/llama2-7b', '--data_path', '/root/autodl-tmp/BitDistiller-main/data/generation/data/llama2-7b-raw/mix_wiki_alpaca_8000.json', '--model_max_length', '1024', '--output_dir', './ckpts/hf-llama-2-7b/', '--logging_dir', './logs/hf-llama-2-7b/', '--num_train_epochs', '4', '--bf16', 'True', '--seed', '42', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '16', '--gradient_accumulation_steps', '1', '--gradient_checkpointing', 'True', '--evaluation_strategy', 'steps', '--eval_steps', '4', '--load_best_model_at_end', 'True', '--save_strategy', 'steps', '--save_steps', '4', '--save_total_limit', '15', '--learning_rate', '8e-6', '--lr_scheduler_type', 'constant', '--weight_decay', '0.', '--logging_steps', '1', '--report_to', 'tensorboard', '--deepspeed', 'config/zero.json', '--bits', '2', '--quant_type', 'int2-asym', '--q_group_size', '128', '--train_kd', 'True', '--kd_loss_type', 'cakld', '--max_train_samples', '999999', '--clip', '/root/autodl-tmp/BitDistiller-main/clip_cache/hf-llama2-7b/int2-g128.pt'] exits with return code = -9