-
Notifications
You must be signed in to change notification settings - Fork 68
Description
Hey there, I'm trying to run the eval on the random set with retrieval right now. im running on 10 workers across 2 40GB A100's, coming up on 12 hours now. It seems that lean is a much bigger bottleneck than the model itself from my profiling, see attached. The highlighted pink cell appears to be calls to the transformer model while the adjacent cell to the left are all pexpect calls to lean. is there anything im missing about how to run this or is this the expected duration?
and very sorry for my amateur approach, but from text searching through the logs i think i've only completed ~200 proof searches in this time. i don't see an indicator of the number completed in the logs other than this. at this rate it will take days to complete the eval, it seems i might be doing something wrong. i am using the default values for timeout and tactic samples:
num_sampled_tactics: int = 64,
timeout: int = 600
running:
python prover/evaluate.py --data-path data/leandojo_benchmark_4/random/ --gen_ckpt_path kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small --ret_ckpt_path kaiyuy/leandojo-lean4-retriever-byt5-small --indexed-corpus-path $PATH_TO_INDEXED_CORPUS --split test --num-workers 10 --num-gpus 2 --save-results
Thanks
