Skip to content

Conversation

@mitochon
Copy link

Problem
FILTER expression involving '*' or '?' does not give correct result if more than 1 variable used.

Example:
query Q1 filter "(BAR[*] < 0.1)" test.vcf -> yields no result
query Q2 filter "(BAR[*] < 0.1) & (FOO[*] >= 2)" test.vcf -> yields 3 result
Q2 is more strict than Q1 so its result should always be a subset of Q1's result.

Cause
After looking at the code, the * or ? query currently supports 1 variable.
If there are multiple variables FOO, BAR both having * or ? predicate, the index n used to evaluate BAR[n] does not get evaluated properly.
This causes incorrect results to be generated. See also TestCasesFilter#test_57 in this pull request for concrete example.

Proposed Solution
Track the current index for each query variable in FieldIterator.
In this context FOO will track some index m and BAR will track another index n.
Previously both FOO and BAR tracks the same index n even though they are at different stages in the iteration.

Verification
Added the following test files:
TestCasesFilter#test_57
test/test_filter_multiple_var.vcf

<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr</artifactId>
<artifactId>antlr4</artifactId>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not relevant, but I had to make this change for mvn to run successfully.

FieldIterator.get().setMax(IteratorType.GENOTYPE_VAR, sub.length - 1);
FieldIterator.get().setType(index);
idx = FieldIterator.get().get(IteratorType.VAR);
idx = FieldIterator.get().get(IteratorType.GENOTYPE_VAR);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this change, but seems like this should be GENOTYPE_VAR.
Otherwise this need to be changed to

idx = FieldIterator.get().getVar(name)

Would recommend adding a test case that covers this use case.


// Filter data
SnpSiftCmdFilter snpsiftFilter = new SnpSiftCmdFilter();
String expression = "(EXAC_AF[*] <= 0.1) & (COSMIC_SITE_COUNT_SOMATIC[*] >= 2)";
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add this test case and undo the other changes above, EXAC_AF[*] always yields 0, so the left hand side of this expression always evaluates to true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant