-
Notifications
You must be signed in to change notification settings - Fork 68
Description
Hi, thanks for open releasing the source code of ReProver and providing a very helpful guide on training the model. I notice that the in-file negatives are critical for the performance of ReProver, as explained in the paper, and I'm trying to understand how this is implemented.
As far as I understand, given an example ex, this for-loop iterates over all premises that are present in the file where the context belongs, i.e., all premises in ex["context"].path.
ReProver/retrieval/datamodule.py
Lines 105 to 112 in 0dbb82e
| for p in self.corpus.get_premises(ex["context"].path): | |
| if p == ex["pos_premise"]: | |
| continue | |
| if p.end < ex["context"].theorem_pos: | |
| if ex["pos_premise"].path == ex["context"].path: | |
| premises_in_file.append(p) | |
| else: | |
| premises_outside_file.append(p) |
Here is what I found a bit confusing: when ex["pos_premise"].path != ex["context"].path (i.e., when the premise is imported from some other file instead of defined/proved in the same file), all of these premises we are iterating on will be added to premises_outside_file, instead of premises_in_file. This seems a bit counter-intuitive, because these premises actually come from the same file as the context, could you please explain a bit more on this?