Skip to content

running eval correctly? #80

@bogedy

Description

@bogedy

Hey there, I'm trying to run the eval on the random set with retrieval right now. im running on 10 workers across 2 40GB A100's, coming up on 12 hours now. It seems that lean is a much bigger bottleneck than the model itself from my profiling, see attached. The highlighted pink cell appears to be calls to the transformer model while the adjacent cell to the left are all pexpect calls to lean. is there anything im missing about how to run this or is this the expected duration?

and very sorry for my amateur approach, but from text searching through the logs i think i've only completed ~200 proof searches in this time. i don't see an indicator of the number completed in the logs other than this. at this rate it will take days to complete the eval, it seems i might be doing something wrong. i am using the default values for timeout and tactic samples:

num_sampled_tactics: int = 64,
timeout: int = 600

running:

python prover/evaluate.py --data-path data/leandojo_benchmark_4/random/  --gen_ckpt_path kaiyuy/leandojo-lean4-retriever-tacgen-byt5-small --ret_ckpt_path kaiyuy/leandojo-lean4-retriever-byt5-small --indexed-corpus-path $PATH_TO_INDEXED_CORPUS --split test --num-workers 10 --num-gpus 2 --save-results

Thanks

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions