-
Notifications
You must be signed in to change notification settings - Fork 3
Description
accelerate launch --num_processes 8 --multi_gpu --mixed_precision "bf16" run_ccot.py
--model_name_or_path models/Llama-3.2-1B
--config_name configs/pccot_llama3.2_1b.json
--config_overrides num_iterations=3,loss_gamma=20.0,use_layerwise_std=false
--num_latent_tokens 24
--train_file dataset/gsm/train.json
--validation_file gsm/valid.json
--test_file /mnt/geminisgceph1/geminicephfs/mmsearch-luban-universal/group_semantic_doc/user_mylasong/LatentReasoning/dataset/gsm/test.json
--label_names labels cot_labels
--use_peft False
--lora_target_modules q_proj-k_proj-v_proj-o_proj-down_proj-up_proj-gate_proj
--lora_modules_to_save ""
--remove_unused_columns false
--per_device_train_batch_size 16
--per_device_eval_batch_size 32
--auto_find_batch_size
--gradient_accumulation_steps 1
--block_size 1024
--attn_implementation flash_attention_2
--use_liger_kernel
--lr_scheduler_type cosine
--warmup_ratio 0.03
--learning_rate 8e-4
--weight_decay 1e-1
--bf16
--torch_dtype bfloat16
--do_train
--do_predict False
--num_train_epochs 10
--save_strategy epoch
--eval_strategy epoch
--logging_steps 50
--report_to wandb
--run_name pcot-llama3.2_1b-gsm-num_iterations3_num_latent_tokens24_gamma20
--output_dir outputs/pcot-llama3.2_1b-gsm-num_iterations3_num_latent_tokens24_gamma20
--seed 1
--overwrite_output_dir \
Thanks for your great work!
I trained llama-3.2-1b-base using the hyperparameters mentioned above for full- parameters tuning, but the final performance was quite poor (acc=0.02). Could you please suggest recommended training parameters for llama-3.2-1b? My understanding is that the base model and the instruction model should not differ too much in terms of configuration and results.