Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ protected MaxAggregate(BoundTerm<T> term) {

@Override
protected boolean hasValue(DataFile file) {
// If nanValueCounts > 0 , it could be -NaN or NaN and hence we cant conclude the max value is
// NaN
Long nanCount = safeGet(file.nanValueCounts(), fieldId);
if (nanCount != null && nanCount > 0) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we extract a small helper (e.g., hasNaNs) to keep this logic in one place?

boolean hasBound = safeContainsKey(file.upperBounds(), fieldId);
Long valueCount = safeGet(file.valueCounts(), fieldId);
Long nullCount = safeGet(file.nullValueCounts(), fieldId);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ protected MinAggregate(BoundTerm<T> term) {

@Override
protected boolean hasValue(DataFile file) {
// If nanValueCounts > 0 , it could be -NaN or NaN and hence we cant conclude the min value is
// -NaN
Long nanCount = safeGet(file.nanValueCounts(), fieldId);
if (nanCount != null && nanCount > 0) {
return false;
}
boolean hasBound = safeContainsKey(file.lowerBounds(), fieldId);
Long valueCount = safeGet(file.valueCounts(), fieldId);
Long nullCount = safeGet(file.nullValueCounts(), fieldId);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -740,6 +740,7 @@ public void testNaN() {
sql(
"INSERT INTO %s VALUES (1, float('nan')),"
+ "(1, float('nan')), "
+ "(1, 10.0), "
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is replicated with this change in test class.

Once this approach is okay, I shall update the testClasses in other spark versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why the bug is triggered with the addition of this row?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the bug is triggered when we have data file containing nan count, upper bound and lower bound.

Previously without that line only nan count is created, with this change upper and lower bound is generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, it's a bit unclear to me why the additional row would trigger collecting lower/upper bounds. I'd have to double check if there's some minimum threshold of rows or some other condition that controls whether lower/upper is written when the footer is written. Looking at this test without the change I would've expected a lower/upper bounds of 1.0 and 2.0 respectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the floating-point nature of the 10.0 value?
Seems some unrelated bug of its own.

+ "(2, 2), "
+ "(2, float('nan')), "
+ "(3, float('nan')), "
Expand All @@ -762,7 +763,7 @@ public void testNaN() {

List<Object[]> actual = sql(select, tableName);
List<Object[]> expected = Lists.newArrayList();
expected.add(new Object[] {6L, Float.NaN, 1.0F, 6L});
expected.add(new Object[] {7L, Float.NaN, 1.0F, 7L});
assertEquals("expected and actual should equal", expected, actual);
}

Expand Down