-
Notifications
You must be signed in to change notification settings - Fork 180
Implement reverse performance optimization
#4775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4483045 to
0246535
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)
Hi Chen, I think that's a valid concern. However, after trying it out, I think it has significant complexity comparing to the current approach. I think |
noCharger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add benchmark results on before VS after?
9fee6b2 to
13d798d
Compare
| if (node instanceof Aggregate | ||
| || node instanceof org.apache.calcite.rel.core.Join | ||
| || node instanceof Correlate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be simplified to Aggregate || BiRel || SetOp. It will cover more cases including set operations like union.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update! I go through all possible RelNodes built by RelBuilder and figure out more corner cases used in current PPL. We need to stop backtracking as well by then.
- Uncollect RelNode (it's usually used as a child of LogicalCorrelate, so we should be fine in most cases)
- When project contains window functions, aka RexOver, the ordering will be determined by window function's order by and partitions. So we should not destroy its output order as well.
For the other part, it seems fine for now. Please also add some complex query combined with different operators in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, also added UTs and ITs for different operators
|
LGTM. Please get other signoffs. |
|
Hi @ahkcs , #4784 allows user to specify a timestamp field in Although I doubt that there isn't much impact because all timechart commands have a sort at the end of their plans, making them to fall into your first tier. Can you please double check? |
This commit optimizes the `reverse` command in the Calcite planner by intelligently reversing existing sort collations instead of always using the ROW_NUMBER() approach. Key changes: - Added PlanUtils.reverseCollation() method to flip sort directions and null directions - Updated CalciteRelNodeVisitor.visitReverse() to: - Check for existing sort collations - Reverse them if present (more efficient) - Fall back to ROW_NUMBER() when no sort exists - Added comprehensive integration test expected outputs for: - Single field reverse pushdown - Multiple field reverse pushdown - Reverse fallback cases - Double reverse no-op optimizations This optimization significantly improves performance when reversing already-sorted data by leveraging database-native sort reversal. Based on PR opensearch-project#4056 by @selsong Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com> # Conflicts: # docs/user/ppl/cmd/reverse.rst
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
631a24d to
8977a8f
Compare
WalkthroughThis change implements a collation-based reverse pushdown for Calcite plans: it backtracks the RelNode tree to find Sort collations (respecting blocking operators), reverses or inserts reversed Sorts (or falls back to Changes
Sequence Diagram(s)sequenceDiagram
participant Q as Query (reverse)
participant V as CalciteRelNodeVisitor
participant B as Backtracker
participant T as RelNode Tree
participant U as PlanUtils
participant P as Final Plan
Q->>V: visit reverse node
V->>B: check current Sort collation
alt direct Sort with collation
B->>U: reverseCollation(existing)
U-->>V: reversed collation
V->>P: apply reversed collation to Sort
else no direct Sort
B->>T: walk ancestors until Sort or blocker
T-->>B: found Sort / blocker / none
alt ancestor Sort found (before blocker)
B->>T: rebuild subtree inserting reversed Sort
T-->>P: tree with inserted reversed Sort
else no Sort found
B->>T: check row type for `@timestamp`
alt `@timestamp` present
T->>P: insert Sort on `@timestamp` DESC
else
T->>P: no-op (reverse ignored)
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
docs/user/ppl/cmd/reverse.rst (1)
64-81: Example 2 uses hardcoded future dates - consider updating for realism.The example shows timestamps from July 2025 (
2025-07-28), which are in the future relative to the current date (November 2025 based on context). While this doesn't affect functionality, using realistic past timestamps or noting these are sample values would improve documentation quality.This is a minor documentation nitpick.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
468-473: Consider future-proofingtestExplainReverseWithTimestampfor configurable time fields
testExplainReverseWithTimestampcurrently assumes@timestampas the time field. With the separate work allowing configurable time fields (e.g., intimechart), you may eventually want a companion explain test that asserts reverse uses the resolved time field rather than hard-coding@timestamp, to prevent regressions when that logic evolves.ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (1)
345-360: Result-order assertion after aggregation may be fragileIn
testReverseAfterSortAndAggregationIsNoOp, the expected result string relies on a specific row order from an aggregation without an explicit ORDER BY. If the underlying engine ever changes its grouping or output-order behavior, this test could fail despite reverse still being a no-op.You might consider either:
- dropping the result-order assertion and only checking the logical plan, or
- adding an explicit ORDER BY in the PPL and adjusting expectations accordingly.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)
180-191: Consider using unordered verification for no-op test.When
reverseis a no-op, the "natural order" is technically undefined and could vary based on shard allocation or segment merges. While freshly-loaded test indices are typically stable, usingverifyDataRows(unordered) or explicitly sorting would make this test more robust against flakiness.- // Without sort or @timestamp, reverse is ignored, so data comes in natural order - // The first 3 documents in natural order (ascending by account_number) - verifyDataRowsInOrder(result, rows(1), rows(6), rows(13)); + // Without sort or @timestamp, reverse is ignored, so data comes in natural (undefined) order + // Just verify we get 3 rows with valid account numbers + verifyDataRows(result, rows(1), rows(6), rows(13));Alternatively, keep the ordered assertion if you're confident the test data loading order is deterministic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java(2 hunks)core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java(2 hunks)docs/category.json(1 hunks)docs/user/ppl/cmd/reverse.rst(5 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java(6 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml(1 hunks)integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml(1 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java(5 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (36)
integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml (1)
1-13: Let me analyze the original review comment based on the provided information without repository access.Looking at the original review comment, I can make the following observations:
- Content Analysis: The YAML snippet shows valid YAML structure with proper indentation and quoting
- Logical Plan: The logical plan shows proper DESC sort on @timestamp field ($0), which aligns with stated PR objectives
- Physical Plan: The physical plan correctly pushes down the sort operation with proper DESC ordering and
"missing": "_first"configuration- Review Structure: The original review has contradictory elements - it includes
tag while also requestingwith a verification scriptKey Issue: The original review requests verification via shell script execution but includes approval tags. Without access to verify companion test files or run the verification script, I cannot confirm the consistency claims across related test resources.
Test resource shows correct DESC sort on @timestamp with proper push-down to OpenSearch, but verification of consistency with related test files cannot be completed.
The logical and physical plans correctly implement DESC sorting on @timestamp with appropriate missing value handling. However, the requested verification against companion test cases (e.g.,
explain_reverse_ignored.yaml) could not be performed due to inability to access the repository file structure. The YAML syntax appears valid based on static inspection, and the logical progression from LogicalSort through LogicalProject to CalciteLogicalIndexScan is sound.docs/category.json (1)
45-45: LGTM!Documentation entry correctly added in alphabetical order within the
ppl_cli_calcitecategory.integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml (1)
1-19: LGTM!The expected output correctly demonstrates:
- Sort direction reversal (DESC→ASC for age, ASC→DESC for firstname)
- Null direction reversal (nulls-last↔nulls-first)
- Proper pushdown of reversed sort to OpenSearch with correct JSON structure
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (1)
594-623: LGTM!The
reverseCollationutility is well-implemented:
- Properly handles null/empty input with early return
- Correctly uses
direction.reverse()API for sort direction- Appropriately flips null direction (FIRST↔LAST, preserving UNSPECIFIED)
- Returns immutable collation via
RelCollations.of()core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (4)
786-788: The build/push pattern is correct for replacing the current node.The sequence
build()(pop) followed bypush(rebuiltTree)correctly replaces the current node on the RelBuilder stack with the rebuilt tree containing the reversed sort.
690-724: Backtracking logic is well-designed with comprehensive blocking operator checks.The method correctly identifies operators that would invalidate sort ordering:
- Aggregate (destroys row-level ordering)
- BiRel (covers Join, Correlate)
- SetOp (Union, Intersect, Except)
- Uncollect (unnesting)
- LogicalProject with window functions (ordering from window's ORDER BY)
Based on learnings from past review comments, the blocking operators were expanded to cover these edge cases.
735-761: I'm unable to clone the repository to verify the test cases and implementation details directly. However, based on the code snippet provided in the review comment, I can analyze the logical correctness of the concern.The code appears to correctly preserve LIMIT semantics as implemented.
Analysis of the
insertReversedSortInTreemethod:
Tree Structure: The method creates a new
LogicalSorton top of the original sort node (line 753-754), rather than replacing it. The original sort with its fetch/offset constraints remains in the tree.Execution Flow: When a query like
source=t | sort age | head 10 | reverseis executed:
- The original
LogicalSortnode (with fetch=10) applies the limit first- The reversed
LogicalSort(with null fetch/offset) is applied on top, reversing the already-limited result- The limit is preserved because the original sort retains its fetch constraint
Why null fetch/offset is correct: The reversed sort on top doesn't need a fetch/offset because it operates on an already-limited dataset from the original sort. Setting fetch/offset on the reversed sort would be redundant and incorrect.
However, I cannot conclusively verify this claim without access to test cases or the actual query execution flow due to repository access issues. The logical analysis suggests the implementation is sound, but there could be edge cases or runtime considerations not apparent from the code alone.
769-776: Based on my web search findings, I can now provide a verification of the review comment's concern.The review concern about edge cases is valid and well-founded.
The web search confirms that Calcite's
MetadataQuery.collations()can indeed return collations from multiple sources:
- TableScan/RelOptTable declared collations (index or table-level ordering)
- Upstream input collations (from Sort, Project, Calc, Window, Join operators)
- Provider-specific logic (enumerable implementations, planner-specific handlers)
This means your concern is correct: a collation found via
MetadataQuery.collations()may come from:
- An index's natural ordering (no explicit Sort node)
- A table's declared collation
- Upstream operators that preserve collations
When the code calls
relBuilder.sort(reversedCollation), it does add a new sort operation if the collation source isn't an actual Sort node. However, the semantics appear intentionally correct—reversing a known ordering (whether from index, table, or upstream) and applying it via sort is still semantically valid.The issue is verification coverage, not the logic itself. Without access to the repository to check test files, I cannot definitively verify whether all these edge cases (especially index natural ordering without explicit Sort nodes) are covered by tests.
Consider documenting the edge case: collation from metadata query sources beyond Sort nodes.
The code correctly reverses collations from all metadata sources (index ordering, table collations, upstream operators), not just explicit Sort nodes. However, for clarity with the PR's "flip sort directions" documentation, add a comment explaining that
MetadataQuery.collations()may return ordering information from index definitions or table metadata, and that the reversed collation is applied uniformly regardless of source. This would also clarify that new sort operations may be added for index-based orderings.docs/user/ppl/cmd/reverse.rst (1)
14-43: Documentation clearly explains the three-tier behavior and optimization benefits.The description, behavior section, and note effectively communicate:
- The conditional logic based on sort/timestamp presence
- Performance benefits of avoiding materialization
- Memory optimization rationale
This aligns well with the PR objectives of enabling pushdown and avoiding expensive operations on large datasets.
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)
91-91: Include reverse tests in no-pushdown suite – looks goodAdding
CalciteReverseCommandIT.classhere keeps reverse coverage consistent for both pushdown and no-pushdown modes. No further changes needed.integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml (1)
1-7: Double-reverse ignored plan matches no-op semanticsLogical/physical plans show no sort introduced for the double
reversecase; only limit/project are applied, which matches the described no-op behavior. Fixture looks consistent.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml (1)
1-9: No-pushdown double-reverse ignored fixture is consistentPlan omits any sort and only reflects system limit and projection, which aligns with the intended “double reverse is a no-op” behavior under no-pushdown.
integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml (1)
1-10: Single-field reverse pushdown plan matches optimization intentLogical plan shows original + reversed sorts; physical plan collapses to a single pushed-down
age ASCsort with limit, consistent with the described reverse optimization.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml (1)
1-12: Reverse with @timestamp (no-pushdown) plan matches documented behaviorPlan adds a DESC sort on
@timestampplus the head limit, implemented via EnumerableSort/Limit rather than pushdown, which is exactly what the no-pushdown path is expected to do.integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml (1)
1-20: Double-reverse multi-field pushdown fixture reflects canceled reversesLogical sorts show the flip–flip pattern, while the physical plan pushes down the original
age DESC/firstname.keyword ASCsort with limit, which aligns with the double-reverse semantics.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml (1)
1-12: No-pushdown multi-field reverse plan looks correctThe logical plan shows original + reversed sorts; the physical plan retains only the final reversed collation as an
EnumerableSort(no pushdown), which matches the intended behavior for this configuration despite the filename.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml (1)
1-13: Double-reverse no-pushdown logical/physical shape looks consistentLogical and physical plans correctly reflect two reverses on a single sort (triple LogicalSort wrapper, single physical DESC sort under a LIMIT), matching the intended “double reverse = original ordering” behavior in the no-pushdown path.
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml (1)
1-12: Single reverse no-pushdown plan matches flipped sort semanticsThe expected logical/physical plans correctly implement
sort - age | reverseas an ASC sort on age at the physical layer, while preserving the intermediate DESC/ASC logical structure.integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml (1)
1-15: Double-reverse pushdown correctly preserves original DESC sort at indexThe explain output shows triple sorts in the logical plan but a single DESC sort in PushDownContext, so the pushed sort matches the original
sort - ageafter two reverses, as desired.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml (1)
1-13: Multi-key double-reverse no-pushdown plan preserves original collationFinal physical sort on
(age DESC, firstname ASC)after two reverses matches the expected “last sort wins, double reverse = original” semantics for multi-field sorts.integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml (1)
1-11: Reverse-no-op (no sort, no @timestamp) is represented correctlyThe plan omits any additional sort for
reverseand only shows thefetch=[5]limit/sort forhead 5, matching the “reverse ignored” behavior in this context.integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml (1)
1-8: Pushdown reverse-no-op keeps only LIMITs in contextThe pushed plan correctly omits any reverse sort and carries both the explicit
head 5and globalQUERY_SIZE_LIMITas LIMIT entries, withsize=5in the request builder.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
417-473: Reverse explain coverage for sort/no-sort/timestamp looks solidThe new ITs exercise the key reverse scenarios (ignored w/o sort/@timestamp, single & multi-field pushdown, double reverse, and
@timestamp-driven sort) against explain output, which lines up well with the documented behavior matrix for this PR.ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (5)
12-23: Reverse behavior Javadoc is clear and matches the implemented test matrixThe class-level documentation concisely captures the three reverse modes (flip existing collation, use
@timestamp, no-op) and points to integration tests for the non-collation cases, which aligns well with the scenarios exercised below.
159-240: Planner tests for multiple/multi-field sorts with reverse are well-structuredThe tests from
testMultipleSortsWithReverseParserSuccessthroughtestReverseWithFieldsAndSortParserSuccessaccurately encode the expected logical plans and Spark SQL for:
- multiple sequential sorts where reverse targets the last one,
- multi-field collations with direction flipping, and
- interaction with
fieldsprojections.These should give good safety nets for future refactors of the reverse/backtracking logic.
242-290: Head-then-sort-then-reverse no-opt test correctly guards semantics
testHeadThenSortReverseNoOptandtestSortFieldsReverseexplicitly assert the presence and ordering of the three LogicalSort nodes (fetch, sort, reverse) and the backtracking case where the sort key is projected away, which is important to prevent “helpful” optimizations that would silently change PPL semantics.
297-360: Reverse-no-op tests after aggregation/join match the “collation destroyed” ruleThe tests asserting that reverse is ignored after aggregation and joins, and after
sortfollowed by aggregation, correctly expect no additional LogicalSort node and no ORDER BY in the generated SQL, demonstrating that reverse doesn’t try to infer collation past blocking operators.
362-458: Reverse through filters/eval/join+sort is comprehensively coveredThe remaining tests (reverse after where/eval/multiple filters, sort–join–sort–reverse, and reverse on a post-aggregation sort) collectively validate the backtracking strategy and ensure reverse only inverts collations where they’re still semantically valid, which is exactly what this optimization needs.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (8)
8-32: LGTM!The import additions and setup modifications properly support the new test cases. Loading
TIME_TEST_DATAfor@timestampfield tests andSTATE_COUNTRYforstreamstatstests is appropriate.
35-110: LGTM!The existing tests are correctly updated to include explicit
sortcommands beforereverse, aligning with the new pushdown-friendly implementation. The expected results remain consistent, validating the optimization preserves correctness.
193-216: LGTM!This test correctly validates that when no explicit sort exists but
@timestampis present,reverseadds a@timestamp DESCsort. The expected values align with the time-series test data.Minor note: The comment on line 208 mentions "IDs 100, 99, 98, 97, 96" which may be outdated or confusing since the test is ordered by timestamp, not IDs.
218-229: LGTM!This test correctly validates that explicit sort takes precedence over
@timestampfallback. The reverse operation flips thesort valuedirection rather than applying@timestamp DESC.
231-338: LGTM - addresses past review concerns about streamstats.These tests comprehensively cover the
streamstats | reversescenarios discussed in previous reviews:
- No-op when
__stream_seq__is projected out and not detectable- Working reverse when backtracking finds collation (partition case)
- Working reverse with explicit sort after streamstats
This aligns with the backtracking implementation mentioned in the past review discussion.
358-372: Inconsistency between expected behavior and verification method.The comment on line 367-368 states that with explicit sort and reverse, data should be in descending gender order (M, F). However, line 371 uses
verifyDataRows(unordered) rather thanverifyDataRowsInOrder.If
reverseis working correctly aftersort gender, the output order should be deterministic and you should use ordered verification:- // Note: Due to column reordering after stats (c, gender), the result order - // may differ from expected. Using unordered verification for robustness. - verifyDataRows(result, rows(4, "M"), rows(3, "F")); + // With explicit sort and reverse, data is in descending gender order: M, F + verifyDataRowsInOrder(result, rows(4, "M"), rows(3, "F"));If there's a known issue preventing ordered output after aggregation+sort+reverse, please clarify in the comment.
389-440: LGTM!These tests properly validate that the backtracking logic traverses through non-blocking operators (filters, eval/project) to find the underlying sort collation. The expected results correctly reflect the filtered data in reversed sort order.
442-456: LGTM!This test correctly validates that the
@timestampfallback only applies when@timestampis present in the output schema. Since aggregation (stats count() as c by category) doesn't include@timestamp, reverse correctly becomes a no-op.
| | **1. With existing sort**: Reverses the sort direction(s) | ||
| | **2. With @timestamp field (no explicit sort)**: Sorts by @timestamp in descending order | ||
| | **3. Without sort or @timestamp**: The command is ignored (no effect) | ||
| ============ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix misplaced RST heading underline.
Line 19 has a stray ============ that appears to be a formatting artifact. In RST, heading underlines should directly follow heading text. This line should likely be removed or the preceding content restructured.
| **2. With @timestamp field (no explicit sort)**: Sorts by @timestamp in descending order
| **3. Without sort or @timestamp**: The command is ignored (no effect)
-============
Behavior
========🤖 Prompt for AI Agents
In docs/user/ppl/cmd/reverse.rst around line 19, remove the stray line
containing only "============" (or move it to directly underline a preceding
heading) because it is a misplaced RST heading underline; ensure any heading
underline in this file directly follows the heading text and that no standalone
underline artifacts remain.
| public void testReverseWithDescendingSort() throws IOException { | ||
| // Test reverse with descending sort (- age) | ||
| JSONObject result = | ||
| executeQuery( | ||
| String.format( | ||
| "source=%s | sort account_number | fields account_number | reverse | head 3", | ||
| "source=%s | sort - account_number | fields account_number | reverse", | ||
| TEST_INDEX_BANK)); | ||
| verifySchema(result, schema("account_number", "bigint")); | ||
| verifyDataRowsInOrder(result, rows(32), rows(25), rows(20)); | ||
| verifyDataRowsInOrder( | ||
| result, rows(1), rows(6), rows(13), rows(18), rows(20), rows(25), rows(32)); | ||
| } | ||
|
|
||
| @Test | ||
| public void testReverseWithMixedSortDirections() throws IOException { | ||
| // Test reverse with mixed sort directions (- age, + firstname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix comments to match the actual sort fields.
The comments reference age but the code sorts by account_number:
- Line 114: "Test reverse with descending sort (- age)"
- Line 127: "Test reverse with mixed sort directions (- age, + firstname)"
@Test
public void testReverseWithDescendingSort() throws IOException {
- // Test reverse with descending sort (- age)
+ // Test reverse with descending sort (- account_number)
JSONObject result = @Test
public void testReverseWithMixedSortDirections() throws IOException {
- // Test reverse with mixed sort directions (- age, + firstname)
+ // Test reverse with mixed sort directions (- account_number, + firstname)
JSONObject result =🤖 Prompt for AI Agents
In
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 113 to 127, update the misleading inline comments that mention
"age" to reflect the actual sort field(s) used in the tests: change the comment
on line 114 to indicate a descending sort on account_number (e.g., "Test reverse
with descending sort (- account_number)") and change the comment on line 127 to
match the mixed sort fields used in that test (replace "- age, + firstname" with
the actual sort expression present in the test, e.g., "- account_number, +
firstname" or the correct fields), keeping comment style consistent with
surrounding tests.
| @Test | ||
| public void testDoubleReverseWithDescendingSort() throws IOException { | ||
| // Test double reverse with descending sort (- age) | ||
| JSONObject result = | ||
| executeQuery( | ||
| String.format( | ||
| "source=%s | sort - account_number | fields account_number | reverse | reverse", | ||
| TEST_INDEX_BANK)); | ||
| verifySchema(result, schema("account_number", "bigint")); | ||
| verifyDataRowsInOrder( | ||
| result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1)); | ||
| } | ||
|
|
||
| @Test | ||
| public void testDoubleReverseWithMixedSortDirections() throws IOException { | ||
| // Test double reverse with mixed sort directions (- age, + firstname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment inconsistency: "age" should be "account_number".
Lines 148 and 161 also reference "- age" but the code uses "- account_number".
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
- // Test double reverse with descending sort (- age)
+ // Test double reverse with descending sort (- account_number)
JSONObject result = @Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
- // Test double reverse with mixed sort directions (- age, + firstname)
+ // Test double reverse with mixed sort directions (- account_number, + firstname)
JSONObject result =📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Test | |
| public void testDoubleReverseWithDescendingSort() throws IOException { | |
| // Test double reverse with descending sort (- age) | |
| JSONObject result = | |
| executeQuery( | |
| String.format( | |
| "source=%s | sort - account_number | fields account_number | reverse | reverse", | |
| TEST_INDEX_BANK)); | |
| verifySchema(result, schema("account_number", "bigint")); | |
| verifyDataRowsInOrder( | |
| result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1)); | |
| } | |
| @Test | |
| public void testDoubleReverseWithMixedSortDirections() throws IOException { | |
| // Test double reverse with mixed sort directions (- age, + firstname) | |
| @Test | |
| public void testDoubleReverseWithDescendingSort() throws IOException { | |
| // Test double reverse with descending sort (- account_number) | |
| JSONObject result = | |
| executeQuery( | |
| String.format( | |
| "source=%s | sort - account_number | fields account_number | reverse | reverse", | |
| TEST_INDEX_BANK)); | |
| verifySchema(result, schema("account_number", "bigint")); | |
| verifyDataRowsInOrder( | |
| result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1)); | |
| } | |
| @Test | |
| public void testDoubleReverseWithMixedSortDirections() throws IOException { | |
| // Test double reverse with mixed sort directions (- account_number, + firstname) |
🤖 Prompt for AI Agents
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 146 to 161: the inline test comments incorrectly refer to "- age"
while the test actually sorts by account_number; update the two comments at
lines 148 and 161 to reference "- account_number" (or a neutral description like
"descending account_number") so the comment matches the code under test.
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java
Show resolved
Hide resolved
Signed-off-by: Kai Huang <ahkcs@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)
114-116: Inline comments still refer to “age” while tests sort byaccount_numberThe four comments in the descending/mixed-sort tests still mention “age”, but the queries all sort on
account_number:
- Line 115:
// Test reverse with descending sort (- age)- Line 128:
// Test reverse with mixed sort directions (- age, + firstname)- Line 149:
// Test double reverse with descending sort (- age)- Line 162:
// Test double reverse with mixed sort directions (- age, + firstname)This can mislead readers when debugging test failures.
Consider updating them along these lines:
- // Test reverse with descending sort (- age) + // Test reverse with descending sort (- account_number) ... - // Test reverse with mixed sort directions (- age, + firstname) + // Test reverse with mixed sort directions (- account_number, + firstname) ... - // Test double reverse with descending sort (- age) + // Test double reverse with descending sort (- account_number) ... - // Test double reverse with mixed sort directions (- age, + firstname) + // Test double reverse with mixed sort directions (- account_number, + firstname)Also applies to: 128-129, 149-150, 162-163
🧹 Nitpick comments (4)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (4)
289-307: Comment intestStreamstatsByWithReversecontradicts the expected reversed orderingThe leading comment says:
// Test that reverse is ignored after streamstats with partitioning (by clause)but the assertion below uses
verifyDataRowsInOrder(...)with the rows clearly reversed by__stream_seq__within eachcountry:// With backtracking, reverse now works and reverses the __stream_seq__ order verifyDataRowsInOrder( result, rows("Jane", "Canada", ..., 20, 2, 22.5), rows("John", "Canada", ..., 25, 1, 25), rows("Hello", "USA", ..., 30, 2, 50), rows("Jake", "USA", ..., 70, 1, 70));To avoid confusion, the first comment should reflect that reverse does work here, for example:
- // Test that reverse is ignored after streamstats with partitioning (by clause) + // Test that reverse backtracks through streamstats-by and reverses __stream_seq__ order
445-456: Clarify intent intestReverseWithTimestampAfterAggregationcommentThe header says:
// Test that reverse uses @timestamp when aggregation destroys collation // TIME_TEST_DATA has @timestamp fieldbut the body and expectations assert the opposite:
// Even though aggregation destroys collation, there's no @timestamp in the // aggregated result, so reverse is a no-op verifyDataRows(result, rows(26, "A"), rows(25, "B"), rows(25, "C"), rows(24, "D"));Since the query
stats count() as c by category | reversedrops@timestampfrom the result, reverse is intentionally a no-op here.Consider rephrasing the header to match the actual behavior, e.g.:
- // Test that reverse uses @timestamp when aggregation destroys collation - // TIME_TEST_DATA has @timestamp field + // Test that reverse is a no-op after aggregation, even if the source index has @timestamp + // TIME_TEST_DATA has @timestamp, but the aggregated result no longer exposes it
359-373:testReverseAfterAggregationWithSortdoesn’t actually validate the reversed orderingThe comments state that reverse should produce descending gender order:
// With explicit sort and reverse, data should be in descending gender order // Sort by gender ASC: F, M -> Reverse: M, Fbut the assertion uses
verifyDataRows(...), which ignores row order and only checks the multiset of rows:verifyDataRows(result, rows(4, "M"), rows(3, "F"));If the goal is to verify that reverse correctly flips the post-aggregation sort, consider switching to
verifyDataRowsInOrder(...)and tightening the comment; otherwise, adjust the comment to say you’re only validating the result set, not its ordering.This is test quality, not functionality, so can be deferred but would strengthen coverage of the reverse behavior.
498-515:testTimechartWithGroupByAndReversecomment claims reversed order, but the assertion is unorderedThe test description:
// Test timechart with group by and reverse // The sort is on both @timestamp and the group by field ... // All events are in the same hour, so only one time bucket // Hosts are grouped and results are reversedsuggests you expect a specific row ordering after reverse, but the assertion uses
verifyDataRows(...), which doesn’t check order:verifyDataRows( result, rows("2024-07-01 00:00:00", "db-01", 1), rows("2024-07-01 00:00:00", "web-01", 2), rows("2024-07-01 00:00:00", "web-02", 2));Given timechart appends an explicit ORDER BY and reverse adds another, the final ordering should be deterministic here. To actually exercise that, you could:
- Use
verifyDataRowsInOrder(...)with the expected reversed order, or- Relax the comment to just say you’re validating the grouped result content.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java(6 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java(1 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java(3 hunks)
🔇 Additional comments (2)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1)
222-250: NewtestStreamstatsWithReverseexpectations look consistent with non-null-bucket semanticsThe logical and Spark SQL plans correctly model
reverseas an added DESC sort on__stream_seq__on top of the existing ASC sort fromstreamstats, and the aggregation uses the same non-null-bucket MAX window shape astestStreamstatsBy. Looks good.ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java (1)
53-78: Events table schema and timechart+reverse tests are coherent
- The added
created_atTIMESTAMP column is wired consistently: row literals,EventsTableprotoRowType, and the newtimefield=created_attest all agree on column ordering and types.testTimechartWithReverseandtestTimechartWithCustomTimefieldAndReversecorrectly expect:
- Inner ASC sort on the appropriate bucket (
SPAN(@timestamp, ...)orSPAN(created_at, ...)) fromtimechart.- An outer DESC sort from
reverse, with NULLS flipped toNULLS FIRST.
This aligns with the PR’s “tier 1” behavior for reversing existing collations.Also applies to: 370-425, 437-445
|
This PR is stalled because it has been open for 2 weeks with no activity. |
Description
Originally from #4056 by @selsong
This PR implements a significant performance optimization for the
reversecommand by eliminating the expensive ROW_NUMBER() window function and implementing a three-tier logic based on query context.Motivation
The previous implementation used ROW_NUMBER() window function which:
Solution: Three-Tier Reverse Logic
The
reversecommand now follows context-aware behavior:Implementation Details
1. Reverse with Explicit Sort (Primary Use Case)
Query:
Behavior: Flips all sort directions:
+balance, -firstname→-balance, +firstnameLogical Plan:
Physical Plan: (efficiently pushes reversed sort to OpenSearch)
2. Reverse with @timestamp (Time-Series Optimization)
Query:
Behavior: When no explicit sort exists but the index has an @timestamp field, reverse automatically sorts by @timestamp DESC to show most recent events first.
Use Case: Common pattern in log analysis - users want recent logs first
Logical Plan:
3. Reverse Ignored (No-Op Case)
Query:
Behavior: When there's no explicit sort AND no @timestamp field, reverse is ignored. Results appear in natural index order.
Rationale: Avoid expensive operations when reverse has no meaningful semantic interpretation.
Logical Plan:
Note: No sort node is added - reverse is completely ignored.
4. Double Reverse (Cancellation)
Query:
Behavior: Two reverses cancel each other out, returning to original sort order.
Logical Plan:
Final sort order matches original query:
+balance, -firstname5. Multiple Sorts + Reverse
Query:
Behavior: Reverse applies to the most recent sort (from PPL semantics, last sort wins).
Logical Plan:
Result: Only
firstnamesort is reversed (DESC → ASC). Thebalancesort is overridden by PPL's "last sort wins" rule.Related Issues
Resolves #3924
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.