Handling of in-file negatives in ReProver

Hi, thanks for open releasing the source code of ReProver and providing a very helpful guide on training the model. I notice that the in-file negatives are critical for the performance of ReProver, as explained in the paper, and I'm trying to understand how this is implemented.

As far as I understand, given an example `ex`, this for-loop iterates over all premises that are present in the file where the context belongs, i.e., all premises in `ex["context"].path`.

https://github.com/lean-dojo/ReProver/blob/0dbb82e3507cb8303dbb550bdb96bfbbb37e1ced/retrieval/datamodule.py#L105-L112

Here is what I found a bit confusing: when `ex["pos_premise"].path !=  ex["context"].path` (i.e., when the premise is imported from some other file instead of defined/proved in the same file), all of these premises we are iterating on will be added to `premises_outside_file`, instead of `premises_in_file`. This seems a bit counter-intuitive, because these premises actually come from the same file as the context, could you please explain a bit more on this?

	for p in self.corpus.get_premises(ex["context"].path):
	if p == ex["pos_premise"]:
	continue
	if p.end < ex["context"].theorem_pos:
	if ex["pos_premise"].path == ex["context"].path:
	premises_in_file.append(p)
	else:
	premises_outside_file.append(p)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of in-file negatives in ReProver #46

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling of in-file negatives in ReProver #46

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions