Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Nov 10, 2025

Description

Originally from #4056 by @selsong

This PR implements a significant performance optimization for the reverse command by eliminating the expensive ROW_NUMBER() window function and implementing a three-tier logic based on query context.

Motivation

The previous implementation used ROW_NUMBER() window function which:

  • Required materializing the entire dataset
  • Caused excessive memory usage
  • Failed on large datasets (100M+ records) with "insufficient resources" errors

Solution: Three-Tier Reverse Logic

The reverse command now follows context-aware behavior:

  1. With existing sort/collation: Reverses all sort directions (ASC ↔ DESC)
  2. With @timestamp field (no explicit sort): Sorts by @timestamp in descending order
  3. Without sort or @timestamp: The command is ignored (no-op)

Implementation Details

1. Reverse with Explicit Sort (Primary Use Case)

Query:

source=accounts | sort +balance, -firstname | reverse

Behavior: Flips all sort directions: +balance, -firstname-balance, +firstname

Logical Plan:

LogicalSystemLimit(sort0=[$3], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$3], sort1=[$1], dir0=[DESC-nulls-last], dir1=[ASC-nulls-first])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Physical Plan: (efficiently pushes reversed sort to OpenSearch)

CalciteEnumerableIndexScan(table=[[OpenSearch, accounts]],
  PushDownContext=[[..., SORT->[
    {"balance": {"order": "desc", "missing": "_last"}},
    {"firstname.keyword": {"order": "asc", "missing": "_first"}}
  ], LIMIT->10000]])

2. Reverse with @timestamp (Time-Series Optimization)

Query:

source=time_series_logs | reverse | head 100

Behavior: When no explicit sort exists but the index has an @timestamp field, reverse automatically sorts by @timestamp DESC to show most recent events first.

Use Case: Common pattern in log analysis - users want recent logs first

Logical Plan:

LogicalSystemLimit(sort0=[$0], dir0=[DESC], fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(@timestamp=[$0], category=[$1], value=[$2])
    LogicalSort(sort0=[$0], dir0=[DESC])
      CalciteLogicalIndexScan(table=[[OpenSearch, time_data]])

3. Reverse Ignored (No-Op Case)

Query:

source=accounts | reverse | head 100

Behavior: When there's no explicit sort AND no @timestamp field, reverse is ignored. Results appear in natural index order.

Rationale: Avoid expensive operations when reverse has no meaningful semantic interpretation.

Logical Plan:

LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Note: No sort node is added - reverse is completely ignored.


4. Double Reverse (Cancellation)

Query:

source=accounts | sort +balance, -firstname | reverse | reverse

Behavior: Two reverses cancel each other out, returning to original sort order.

Logical Plan:

LogicalSystemLimit(sort0=[$3], sort1=[$1], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last], fetch=[10000])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$3], sort1=[$1], dir0=[ASC-nulls-first], dir1=[DESC-nulls-last])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Final sort order matches original query: +balance, -firstname


5. Multiple Sorts + Reverse

Query:

source=accounts | sort +balance | sort -firstname | reverse

Behavior: Reverse applies to the most recent sort (from PPL semantics, last sort wins).

Logical Plan:

LogicalSystemLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[10000])
  LogicalProject(account_number=[$0], firstname=[$1], ...)
    LogicalSort(sort0=[$1], dir0=[ASC-nulls-first])
      CalciteLogicalIndexScan(table=[[OpenSearch, accounts]])

Result: Only firstname sort is reversed (DESC → ASC). The balance sort is overridden by PPL's "last sort wins" rule.


Related Issues

Resolves #3924

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Enhanced reverse command: flips existing sort directions when possible, backtracks to apply reversed sort when needed, falls back to sorting by @timestamp DESC if no explicit sort, or becomes a no-op when neither applies.
  • Documentation

    • Expanded reverse command docs with explicit three-tier rules, examples (multi-field sorts, head), and notes on performance/behavior.
  • Tests

    • Added extensive integration and unit tests covering pushdown, double-reverse, timestamp fallback, no-op scenarios, and complex pipelines.

✏️ Tip: You can customize this high-level summary in your review settings.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)

@ahkcs
Copy link
Contributor Author

ahkcs commented Nov 11, 2025

QQ: I recall the major comment on original PR is early optimization in analyzer layer. Is this new PR trying to address the concern? Ref: #4056 (comment)

Hi Chen, I think that's a valid concern. However, after trying it out, I think it has significant complexity comparing to the current approach. I think CalciteRelNodeVisitor is used as a logical plan builder that constructs the logical representation of the query, so I think optimization can also happen here. In our approach, our visitReverse is choosing LogicalSort(reversed) vs LogicalSort(ROW_NUMBER), and I think this is appropriate for logical plan builder. If we moved the optimization to Calcite rule, we'd be doing something more complex - starting with a naive representation (always ROW_NUMBER) and rewriting it. That adds significant complexity.

@ahkcs ahkcs requested a review from dai-chen November 11, 2025 22:29
Copy link
Collaborator

@noCharger noCharger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add benchmark results on before VS after?

@ahkcs ahkcs requested a review from noCharger November 14, 2025 18:19
@ahkcs ahkcs force-pushed the feat/reverse_optimization branch 2 times, most recently from 9fee6b2 to 13d798d Compare November 24, 2025 19:28
Comment on lines 684 to 686
if (node instanceof Aggregate
|| node instanceof org.apache.calcite.rel.core.Join
|| node instanceof Correlate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified to Aggregate || BiRel || SetOp. It will cover more cases including set operations like union.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! I go through all possible RelNodes built by RelBuilder and figure out more corner cases used in current PPL. We need to stop backtracking as well by then.

  1. Uncollect RelNode (it's usually used as a child of LogicalCorrelate, so we should be fine in most cases)
  2. When project contains window functions, aka RexOver, the ordering will be determined by window function's order by and partitions. So we should not destroy its output order as well.

For the other part, it seems fine for now. Please also add some complex query combined with different operators in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, also added UTs and ITs for different operators

@songkant-aws
Copy link
Contributor

LGTM. Please get other signoffs.

@yuancu
Copy link
Collaborator

yuancu commented Nov 26, 2025

Hi @ahkcs , #4784 allows user to specify a timestamp field in timechart command. For this specific case, maybe we need to use the specified timefield instead of the hard-coded @timestamp.

Although I doubt that there isn't much impact because all timechart commands have a sort at the end of their plans, making them to fall into your first tier. Can you please double check?

ahkcs added 15 commits November 26, 2025 10:51
This commit optimizes the `reverse` command in the Calcite planner by
intelligently reversing existing sort collations instead of always using
the ROW_NUMBER() approach.

Key changes:
- Added PlanUtils.reverseCollation() method to flip sort directions and
  null directions
- Updated CalciteRelNodeVisitor.visitReverse() to:
  - Check for existing sort collations
  - Reverse them if present (more efficient)
  - Fall back to ROW_NUMBER() when no sort exists
- Added comprehensive integration test expected outputs for:
  - Single field reverse pushdown
  - Multiple field reverse pushdown
  - Reverse fallback cases
  - Double reverse no-op optimizations

This optimization significantly improves performance when reversing
already-sorted data by leveraging database-native sort reversal.

Based on PR opensearch-project#4056 by @selsong

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	docs/user/ppl/cmd/reverse.rst
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the feat/reverse_optimization branch from 631a24d to 8977a8f Compare November 26, 2025 19:05
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

This change implements a collation-based reverse pushdown for Calcite plans: it backtracks the RelNode tree to find Sort collations (respecting blocking operators), reverses or inserts reversed Sorts (or falls back to @timestamp DESC or no-op), replaces the prior ROW_NUMBER-based approach, and adds tests and docs.

Changes

Cohort / File(s) Summary
Core logic
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Adds backtracking to find Sort collations, methods backtrackForCollation and insertReversedSortInTree, replaces ROW_NUMBER-based reverse with collation-based reversal/insertion and timestamp fallback; imports Calcite types (BiRel, SetOp, Uncollect, RelCollation, LogicalProject).
Utility
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java
Adds public static reverseCollation(RelCollation) to flip field directions and null order; imports Calcite collation classes.
Documentation
docs/category.json, docs/user/ppl/cmd/reverse.rst
Adds docs entry and page describing three-tier reverse behavior (flip existing sort, @timestamp DESC fallback, or no-op), examples, and version bump.
Test suite registration
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
Includes CalciteReverseCommandIT in the no-pushdown integration test suite.
Calcite explain tests
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Expands and renames explain tests: multiple new methods covering pushdown, double-reverse, multi-field sorts, and timestamp cases.
Reverse integration tests
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
Adds many integration tests (descending/mixed sorts, double reverse, timestamp behaviors, streamstats/timechart interactions, aggregation/filter edge cases) and related test data imports.
PPL unit tests
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java, ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java, ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java
Adds numerous PPL tests for reverse with multiple/multi-field sorts, head/where/agg/join interactions, streamstats/timechart reverse scenarios; adds created_at TIMESTAMP column to EventsTable test data.
Expected explain outputs (calcite)
integ-test/src/test/resources/expectedOutput/calcite/*.yaml
Adds multiple expected YAML explain plans for reverse pushdown/no-pushdown, double reverse, multi-field, and timestamp scenarios.
Expected explain outputs (calcite_no_pushdown)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/*.yaml
Adds corresponding no-pushdown expected YAML explain plans.

Sequence Diagram(s)

sequenceDiagram
    participant Q as Query (reverse)
    participant V as CalciteRelNodeVisitor
    participant B as Backtracker
    participant T as RelNode Tree
    participant U as PlanUtils
    participant P as Final Plan

    Q->>V: visit reverse node
    V->>B: check current Sort collation
    alt direct Sort with collation
        B->>U: reverseCollation(existing)
        U-->>V: reversed collation
        V->>P: apply reversed collation to Sort
    else no direct Sort
        B->>T: walk ancestors until Sort or blocker
        T-->>B: found Sort / blocker / none
        alt ancestor Sort found (before blocker)
            B->>T: rebuild subtree inserting reversed Sort
            T-->>P: tree with inserted reversed Sort
        else no Sort found
            B->>T: check row type for `@timestamp`
            alt `@timestamp` present
                T->>P: insert Sort on `@timestamp` DESC
            else
                T->>P: no-op (reverse ignored)
            end
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Pay attention to CalciteRelNodeVisitor tree traversal and rewrite: boundary conditions, blocker operator handling (Aggregate, BiRel, SetOp, Uncollect), and correctness of insertion point.
  • Verify PlanUtils.reverseCollation preserves null direction semantics and field ordering.
  • Review extensive new integration/unit tests and expected YAML fixtures for consistency with intended pushdown/no-op rules.
  • Confirm EventsTable test schema change (added timestamp column) does not break unrelated tests.

"🐰
I hopped the RelNode tree with care,
Flipped collations here and there,
If no sort found I seek the time,
Else I leave the rows in rhyme,
A tiny hop to pushdown's chime."

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.68% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Implement reverse performance optimization' clearly and concisely describes the main change: replacing the ROW_NUMBER-based reverse implementation with a context-aware, pushdown-friendly optimization that avoids materializing large datasets.
Linked Issues check ✅ Passed The PR implementation fully addresses all primary objectives from issue #3924: detects logical-plan patterns for pushdown, flips sort direction when applicable [CalciteRelNodeVisitor, PlanUtils], ensures double-reverse is no-op, and gracefully falls back when pushdown conditions aren't met (timestamp-based or no-op tier) [CalciteRelNodeVisitor backtracking logic].
Out of Scope Changes check ✅ Passed All changes are directly scoped to the reverse optimization feature: core logic (CalciteRelNodeVisitor, PlanUtils), comprehensive tests (CalciteReverseCommandIT, CalciteExplainIT, CalcitePPLReverseTest), expected outputs, and documentation updates. No unrelated modifications detected.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
docs/user/ppl/cmd/reverse.rst (1)

64-81: Example 2 uses hardcoded future dates - consider updating for realism.

The example shows timestamps from July 2025 (2025-07-28), which are in the future relative to the current date (November 2025 based on context). While this doesn't affect functionality, using realistic past timestamps or noting these are sample values would improve documentation quality.

This is a minor documentation nitpick.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

468-473: Consider future-proofing testExplainReverseWithTimestamp for configurable time fields

testExplainReverseWithTimestamp currently assumes @timestamp as the time field. With the separate work allowing configurable time fields (e.g., in timechart), you may eventually want a companion explain test that asserts reverse uses the resolved time field rather than hard-coding @timestamp, to prevent regressions when that logic evolves.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (1)

345-360: Result-order assertion after aggregation may be fragile

In testReverseAfterSortAndAggregationIsNoOp, the expected result string relies on a specific row order from an aggregation without an explicit ORDER BY. If the underlying engine ever changes its grouping or output-order behavior, this test could fail despite reverse still being a no-op.

You might consider either:

  • dropping the result-order assertion and only checking the logical plan, or
  • adding an explicit ORDER BY in the PPL and adjusting expectations accordingly.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)

180-191: Consider using unordered verification for no-op test.

When reverse is a no-op, the "natural order" is technically undefined and could vary based on shard allocation or segment merges. While freshly-loaded test indices are typically stable, using verifyDataRows (unordered) or explicitly sorting would make this test more robust against flakiness.

-    // Without sort or @timestamp, reverse is ignored, so data comes in natural order
-    // The first 3 documents in natural order (ascending by account_number)
-    verifyDataRowsInOrder(result, rows(1), rows(6), rows(13));
+    // Without sort or @timestamp, reverse is ignored, so data comes in natural (undefined) order
+    // Just verify we get 3 rows with valid account numbers
+    verifyDataRows(result, rows(1), rows(6), rows(13));

Alternatively, keep the ordered assertion if you're confident the test data loading order is deterministic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88bf56 and 8977a8f.

📒 Files selected for processing (23)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (2 hunks)
  • core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (2 hunks)
  • docs/category.json (1 hunks)
  • docs/user/ppl/cmd/reverse.rst (5 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (6 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml (1 hunks)
  • integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (5 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (36)
integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_with_timestamp.yaml (1)

1-13: Let me analyze the original review comment based on the provided information without repository access.

Looking at the original review comment, I can make the following observations:

  1. Content Analysis: The YAML snippet shows valid YAML structure with proper indentation and quoting
  2. Logical Plan: The logical plan shows proper DESC sort on @timestamp field ($0), which aligns with stated PR objectives
  3. Physical Plan: The physical plan correctly pushes down the sort operation with proper DESC ordering and "missing": "_first" configuration
  4. Review Structure: The original review has contradictory elements - it includes tag while also requesting with a verification script

Key Issue: The original review requests verification via shell script execution but includes approval tags. Without access to verify companion test files or run the verification script, I cannot confirm the consistency claims across related test resources.


Test resource shows correct DESC sort on @timestamp with proper push-down to OpenSearch, but verification of consistency with related test files cannot be completed.

The logical and physical plans correctly implement DESC sorting on @timestamp with appropriate missing value handling. However, the requested verification against companion test cases (e.g., explain_reverse_ignored.yaml) could not be performed due to inability to access the repository file structure. The YAML syntax appears valid based on static inspection, and the logical progression from LogicalSort through LogicalProject to CalciteLogicalIndexScan is sound.

docs/category.json (1)

45-45: LGTM!

Documentation entry correctly added in alphabetical order within the ppl_cli_calcite category.

integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_multiple.yaml (1)

1-19: LGTM!

The expected output correctly demonstrates:

  • Sort direction reversal (DESC→ASC for age, ASC→DESC for firstname)
  • Null direction reversal (nulls-last↔nulls-first)
  • Proper pushdown of reversed sort to OpenSearch with correct JSON structure
core/src/main/java/org/opensearch/sql/calcite/utils/PlanUtils.java (1)

594-623: LGTM!

The reverseCollation utility is well-implemented:

  • Properly handles null/empty input with early return
  • Correctly uses direction.reverse() API for sort direction
  • Appropriately flips null direction (FIRST↔LAST, preserving UNSPECIFIED)
  • Returns immutable collation via RelCollations.of()
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (4)

786-788: The build/push pattern is correct for replacing the current node.

The sequence build() (pop) followed by push(rebuiltTree) correctly replaces the current node on the RelBuilder stack with the rebuilt tree containing the reversed sort.


690-724: Backtracking logic is well-designed with comprehensive blocking operator checks.

The method correctly identifies operators that would invalidate sort ordering:

  • Aggregate (destroys row-level ordering)
  • BiRel (covers Join, Correlate)
  • SetOp (Union, Intersect, Except)
  • Uncollect (unnesting)
  • LogicalProject with window functions (ordering from window's ORDER BY)

Based on learnings from past review comments, the blocking operators were expanded to cover these edge cases.


735-761: I'm unable to clone the repository to verify the test cases and implementation details directly. However, based on the code snippet provided in the review comment, I can analyze the logical correctness of the concern.

The code appears to correctly preserve LIMIT semantics as implemented.

Analysis of the insertReversedSortInTree method:

  1. Tree Structure: The method creates a new LogicalSort on top of the original sort node (line 753-754), rather than replacing it. The original sort with its fetch/offset constraints remains in the tree.

  2. Execution Flow: When a query like source=t | sort age | head 10 | reverse is executed:

    • The original LogicalSort node (with fetch=10) applies the limit first
    • The reversed LogicalSort (with null fetch/offset) is applied on top, reversing the already-limited result
    • The limit is preserved because the original sort retains its fetch constraint
  3. Why null fetch/offset is correct: The reversed sort on top doesn't need a fetch/offset because it operates on an already-limited dataset from the original sort. Setting fetch/offset on the reversed sort would be redundant and incorrect.

However, I cannot conclusively verify this claim without access to test cases or the actual query execution flow due to repository access issues. The logical analysis suggests the implementation is sound, but there could be edge cases or runtime considerations not apparent from the code alone.


769-776: Based on my web search findings, I can now provide a verification of the review comment's concern.

The review concern about edge cases is valid and well-founded.

The web search confirms that Calcite's MetadataQuery.collations() can indeed return collations from multiple sources:

  1. TableScan/RelOptTable declared collations (index or table-level ordering)
  2. Upstream input collations (from Sort, Project, Calc, Window, Join operators)
  3. Provider-specific logic (enumerable implementations, planner-specific handlers)

This means your concern is correct: a collation found via MetadataQuery.collations() may come from:

  • An index's natural ordering (no explicit Sort node)
  • A table's declared collation
  • Upstream operators that preserve collations

When the code calls relBuilder.sort(reversedCollation), it does add a new sort operation if the collation source isn't an actual Sort node. However, the semantics appear intentionally correct—reversing a known ordering (whether from index, table, or upstream) and applying it via sort is still semantically valid.

The issue is verification coverage, not the logic itself. Without access to the repository to check test files, I cannot definitively verify whether all these edge cases (especially index natural ordering without explicit Sort nodes) are covered by tests.


Consider documenting the edge case: collation from metadata query sources beyond Sort nodes.

The code correctly reverses collations from all metadata sources (index ordering, table collations, upstream operators), not just explicit Sort nodes. However, for clarity with the PR's "flip sort directions" documentation, add a comment explaining that MetadataQuery.collations() may return ordering information from index definitions or table metadata, and that the reversed collation is applied uniformly regardless of source. This would also clarify that new sort operations may be added for index-based orderings.

docs/user/ppl/cmd/reverse.rst (1)

14-43: Documentation clearly explains the three-tier behavior and optimization benefits.

The description, behavior section, and note effectively communicate:

  • The conditional logic based on sort/timestamp presence
  • Performance benefits of avoiding materialization
  • Memory optimization rationale

This aligns well with the PR objectives of enabling pushdown and avoiding expensive operations on large datasets.

integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)

91-91: Include reverse tests in no-pushdown suite – looks good

Adding CalciteReverseCommandIT.class here keeps reverse coverage consistent for both pushdown and no-pushdown modes. No further changes needed.

integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_ignored.yaml (1)

1-7: Double-reverse ignored plan matches no-op semantics

Logical/physical plans show no sort introduced for the double reverse case; only limit/project are applied, which matches the described no-op behavior. Fixture looks consistent.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_ignored.yaml (1)

1-9: No-pushdown double-reverse ignored fixture is consistent

Plan omits any sort and only reflects system limit and projection, which aligns with the intended “double reverse is a no-op” behavior under no-pushdown.

integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_pushdown_single.yaml (1)

1-10: Single-field reverse pushdown plan matches optimization intent

Logical plan shows original + reversed sorts; physical plan collapses to a single pushed-down age ASC sort with limit, consistent with the described reverse optimization.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_with_timestamp.yaml (1)

1-12: Reverse with @timestamp (no-pushdown) plan matches documented behavior

Plan adds a DESC sort on @timestamp plus the head limit, implemented via EnumerableSort/Limit rather than pushdown, which is exactly what the no-pushdown path is expected to do.

integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_multiple.yaml (1)

1-20: Double-reverse multi-field pushdown fixture reflects canceled reverses

Logical sorts show the flip–flip pattern, while the physical plan pushes down the original age DESC / firstname.keyword ASC sort with limit, which aligns with the double-reverse semantics.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_multiple.yaml (1)

1-12: No-pushdown multi-field reverse plan looks correct

The logical plan shows original + reversed sorts; the physical plan retains only the final reversed collation as an EnumerableSort (no pushdown), which matches the intended behavior for this configuration despite the filename.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_single.yaml (1)

1-13: Double-reverse no-pushdown logical/physical shape looks consistent

Logical and physical plans correctly reflect two reverses on a single sort (triple LogicalSort wrapper, single physical DESC sort under a LIMIT), matching the intended “double reverse = original ordering” behavior in the no-pushdown path.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_pushdown_single.yaml (1)

1-12: Single reverse no-pushdown plan matches flipped sort semantics

The expected logical/physical plans correctly implement sort - age | reverse as an ASC sort on age at the physical layer, while preserving the intermediate DESC/ASC logical structure.

integ-test/src/test/resources/expectedOutput/calcite/explain_double_reverse_pushdown_single.yaml (1)

1-15: Double-reverse pushdown correctly preserves original DESC sort at index

The explain output shows triple sorts in the logical plan but a single DESC sort in PushDownContext, so the pushed sort matches the original sort - age after two reverses, as desired.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_double_reverse_pushdown_multiple.yaml (1)

1-13: Multi-key double-reverse no-pushdown plan preserves original collation

Final physical sort on (age DESC, firstname ASC) after two reverses matches the expected “last sort wins, double reverse = original” semantics for multi-field sorts.

integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/explain_reverse_ignored.yaml (1)

1-11: Reverse-no-op (no sort, no @timestamp) is represented correctly

The plan omits any additional sort for reverse and only shows the fetch=[5] limit/sort for head 5, matching the “reverse ignored” behavior in this context.

integ-test/src/test/resources/expectedOutput/calcite/explain_reverse_ignored.yaml (1)

1-8: Pushdown reverse-no-op keeps only LIMITs in context

The pushed plan correctly omits any reverse sort and carries both the explicit head 5 and global QUERY_SIZE_LIMIT as LIMIT entries, with size=5 in the request builder.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

417-473: Reverse explain coverage for sort/no-sort/timestamp looks solid

The new ITs exercise the key reverse scenarios (ignored w/o sort/@timestamp, single & multi-field pushdown, double reverse, and @timestamp-driven sort) against explain output, which lines up well with the documented behavior matrix for this PR.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLReverseTest.java (5)

12-23: Reverse behavior Javadoc is clear and matches the implemented test matrix

The class-level documentation concisely captures the three reverse modes (flip existing collation, use @timestamp, no-op) and points to integration tests for the non-collation cases, which aligns well with the scenarios exercised below.


159-240: Planner tests for multiple/multi-field sorts with reverse are well-structured

The tests from testMultipleSortsWithReverseParserSuccess through testReverseWithFieldsAndSortParserSuccess accurately encode the expected logical plans and Spark SQL for:

  • multiple sequential sorts where reverse targets the last one,
  • multi-field collations with direction flipping, and
  • interaction with fields projections.

These should give good safety nets for future refactors of the reverse/backtracking logic.


242-290: Head-then-sort-then-reverse no-opt test correctly guards semantics

testHeadThenSortReverseNoOpt and testSortFieldsReverse explicitly assert the presence and ordering of the three LogicalSort nodes (fetch, sort, reverse) and the backtracking case where the sort key is projected away, which is important to prevent “helpful” optimizations that would silently change PPL semantics.


297-360: Reverse-no-op tests after aggregation/join match the “collation destroyed” rule

The tests asserting that reverse is ignored after aggregation and joins, and after sort followed by aggregation, correctly expect no additional LogicalSort node and no ORDER BY in the generated SQL, demonstrating that reverse doesn’t try to infer collation past blocking operators.


362-458: Reverse through filters/eval/join+sort is comprehensively covered

The remaining tests (reverse after where/eval/multiple filters, sort–join–sort–reverse, and reverse on a post-aggregation sort) collectively validate the backtracking strategy and ensure reverse only inverts collations where they’re still semantically valid, which is exactly what this optimization needs.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (8)

8-32: LGTM!

The import additions and setup modifications properly support the new test cases. Loading TIME_TEST_DATA for @timestamp field tests and STATE_COUNTRY for streamstats tests is appropriate.


35-110: LGTM!

The existing tests are correctly updated to include explicit sort commands before reverse, aligning with the new pushdown-friendly implementation. The expected results remain consistent, validating the optimization preserves correctness.


193-216: LGTM!

This test correctly validates that when no explicit sort exists but @timestamp is present, reverse adds a @timestamp DESC sort. The expected values align with the time-series test data.

Minor note: The comment on line 208 mentions "IDs 100, 99, 98, 97, 96" which may be outdated or confusing since the test is ordered by timestamp, not IDs.


218-229: LGTM!

This test correctly validates that explicit sort takes precedence over @timestamp fallback. The reverse operation flips the sort value direction rather than applying @timestamp DESC.


231-338: LGTM - addresses past review concerns about streamstats.

These tests comprehensively cover the streamstats | reverse scenarios discussed in previous reviews:

  • No-op when __stream_seq__ is projected out and not detectable
  • Working reverse when backtracking finds collation (partition case)
  • Working reverse with explicit sort after streamstats

This aligns with the backtracking implementation mentioned in the past review discussion.


358-372: Inconsistency between expected behavior and verification method.

The comment on line 367-368 states that with explicit sort and reverse, data should be in descending gender order (M, F). However, line 371 uses verifyDataRows (unordered) rather than verifyDataRowsInOrder.

If reverse is working correctly after sort gender, the output order should be deterministic and you should use ordered verification:

-    // Note: Due to column reordering after stats (c, gender), the result order
-    // may differ from expected. Using unordered verification for robustness.
-    verifyDataRows(result, rows(4, "M"), rows(3, "F"));
+    // With explicit sort and reverse, data is in descending gender order: M, F
+    verifyDataRowsInOrder(result, rows(4, "M"), rows(3, "F"));

If there's a known issue preventing ordered output after aggregation+sort+reverse, please clarify in the comment.


389-440: LGTM!

These tests properly validate that the backtracking logic traverses through non-blocking operators (filters, eval/project) to find the underlying sort collation. The expected results correctly reflect the filtered data in reversed sort order.


442-456: LGTM!

This test correctly validates that the @timestamp fallback only applies when @timestamp is present in the output schema. Since aggregation (stats count() as c by category) doesn't include @timestamp, reverse correctly becomes a no-op.

| **1. With existing sort**: Reverses the sort direction(s)
| **2. With @timestamp field (no explicit sort)**: Sorts by @timestamp in descending order
| **3. Without sort or @timestamp**: The command is ignored (no effect)
============
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix misplaced RST heading underline.

Line 19 has a stray ============ that appears to be a formatting artifact. In RST, heading underlines should directly follow heading text. This line should likely be removed or the preceding content restructured.

 | **2. With @timestamp field (no explicit sort)**: Sorts by @timestamp in descending order
 | **3. Without sort or @timestamp**: The command is ignored (no effect)
-============
 
 Behavior
 ========
🤖 Prompt for AI Agents
In docs/user/ppl/cmd/reverse.rst around line 19, remove the stray line
containing only "============" (or move it to directly underline a preceding
heading) because it is a misplaced RST heading underline; ensure any heading
underline in this file directly follows the heading text and that no standalone
underline artifacts remain.

Comment on lines +113 to +127
public void testReverseWithDescendingSort() throws IOException {
// Test reverse with descending sort (- age)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort account_number | fields account_number | reverse | head 3",
"source=%s | sort - account_number | fields account_number | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(result, rows(32), rows(25), rows(20));
verifyDataRowsInOrder(
result, rows(1), rows(6), rows(13), rows(18), rows(20), rows(25), rows(32));
}

@Test
public void testReverseWithMixedSortDirections() throws IOException {
// Test reverse with mixed sort directions (- age, + firstname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix comments to match the actual sort fields.

The comments reference age but the code sorts by account_number:

  • Line 114: "Test reverse with descending sort (- age)"
  • Line 127: "Test reverse with mixed sort directions (- age, + firstname)"
   @Test
   public void testReverseWithDescendingSort() throws IOException {
-    // Test reverse with descending sort (- age)
+    // Test reverse with descending sort (- account_number)
     JSONObject result =
   @Test
   public void testReverseWithMixedSortDirections() throws IOException {
-    // Test reverse with mixed sort directions (- age, + firstname)
+    // Test reverse with mixed sort directions (- account_number, + firstname)
     JSONObject result =
🤖 Prompt for AI Agents
In
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 113 to 127, update the misleading inline comments that mention
"age" to reflect the actual sort field(s) used in the tests: change the comment
on line 114 to indicate a descending sort on account_number (e.g., "Test reverse
with descending sort (- account_number)") and change the comment on line 127 to
match the mixed sort fields used in that test (replace "- age, + firstname" with
the actual sort expression present in the test, e.g., "- account_number, +
firstname" or the correct fields), keeping comment style consistent with
surrounding tests.

Comment on lines +146 to +161
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
// Test double reverse with descending sort (- age)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort - account_number | fields account_number | reverse | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(
result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1));
}

@Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
// Test double reverse with mixed sort directions (- age, + firstname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same comment inconsistency: "age" should be "account_number".

Lines 148 and 161 also reference "- age" but the code uses "- account_number".

   @Test
   public void testDoubleReverseWithDescendingSort() throws IOException {
-    // Test double reverse with descending sort (- age)
+    // Test double reverse with descending sort (- account_number)
     JSONObject result =
   @Test
   public void testDoubleReverseWithMixedSortDirections() throws IOException {
-    // Test double reverse with mixed sort directions (- age, + firstname)
+    // Test double reverse with mixed sort directions (- account_number, + firstname)
     JSONObject result =
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
// Test double reverse with descending sort (- age)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort - account_number | fields account_number | reverse | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(
result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1));
}
@Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
// Test double reverse with mixed sort directions (- age, + firstname)
@Test
public void testDoubleReverseWithDescendingSort() throws IOException {
// Test double reverse with descending sort (- account_number)
JSONObject result =
executeQuery(
String.format(
"source=%s | sort - account_number | fields account_number | reverse | reverse",
TEST_INDEX_BANK));
verifySchema(result, schema("account_number", "bigint"));
verifyDataRowsInOrder(
result, rows(32), rows(25), rows(20), rows(18), rows(13), rows(6), rows(1));
}
@Test
public void testDoubleReverseWithMixedSortDirections() throws IOException {
// Test double reverse with mixed sort directions (- account_number, + firstname)
🤖 Prompt for AI Agents
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java
around lines 146 to 161: the inline test comments incorrectly refer to "- age"
while the test actually sorts by account_number; update the two comments at
lines 148 and 161 to reference "- account_number" (or a neutral description like
"descending account_number") so the comment matches the code under test.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (1)

114-116: Inline comments still refer to “age” while tests sort by account_number

The four comments in the descending/mixed-sort tests still mention “age”, but the queries all sort on account_number:

  • Line 115: // Test reverse with descending sort (- age)
  • Line 128: // Test reverse with mixed sort directions (- age, + firstname)
  • Line 149: // Test double reverse with descending sort (- age)
  • Line 162: // Test double reverse with mixed sort directions (- age, + firstname)

This can mislead readers when debugging test failures.

Consider updating them along these lines:

-    // Test reverse with descending sort (- age)
+    // Test reverse with descending sort (- account_number)
...
-    // Test reverse with mixed sort directions (- age, + firstname)
+    // Test reverse with mixed sort directions (- account_number, + firstname)
...
-    // Test double reverse with descending sort (- age)
+    // Test double reverse with descending sort (- account_number)
...
-    // Test double reverse with mixed sort directions (- age, + firstname)
+    // Test double reverse with mixed sort directions (- account_number, + firstname)

Also applies to: 128-129, 149-150, 162-163

🧹 Nitpick comments (4)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (4)

289-307: Comment in testStreamstatsByWithReverse contradicts the expected reversed ordering

The leading comment says:

// Test that reverse is ignored after streamstats with partitioning (by clause)

but the assertion below uses verifyDataRowsInOrder(...) with the rows clearly reversed by __stream_seq__ within each country:

// With backtracking, reverse now works and reverses the __stream_seq__ order
verifyDataRowsInOrder(
    result,
    rows("Jane", "Canada", ..., 20, 2, 22.5),
    rows("John", "Canada", ..., 25, 1, 25),
    rows("Hello", "USA",  ..., 30, 2, 50),
    rows("Jake", "USA",   ..., 70, 1, 70));

To avoid confusion, the first comment should reflect that reverse does work here, for example:

-    // Test that reverse is ignored after streamstats with partitioning (by clause)
+    // Test that reverse backtracks through streamstats-by and reverses __stream_seq__ order

445-456: Clarify intent in testReverseWithTimestampAfterAggregation comment

The header says:

// Test that reverse uses @timestamp when aggregation destroys collation
// TIME_TEST_DATA has @timestamp field

but the body and expectations assert the opposite:

// Even though aggregation destroys collation, there's no @timestamp in the
// aggregated result, so reverse is a no-op
verifyDataRows(result, rows(26, "A"), rows(25, "B"), rows(25, "C"), rows(24, "D"));

Since the query stats count() as c by category | reverse drops @timestamp from the result, reverse is intentionally a no-op here.

Consider rephrasing the header to match the actual behavior, e.g.:

-    // Test that reverse uses @timestamp when aggregation destroys collation
-    // TIME_TEST_DATA has @timestamp field
+    // Test that reverse is a no-op after aggregation, even if the source index has @timestamp
+    // TIME_TEST_DATA has @timestamp, but the aggregated result no longer exposes it

359-373: testReverseAfterAggregationWithSort doesn’t actually validate the reversed ordering

The comments state that reverse should produce descending gender order:

// With explicit sort and reverse, data should be in descending gender order
// Sort by gender ASC: F, M -> Reverse: M, F

but the assertion uses verifyDataRows(...), which ignores row order and only checks the multiset of rows:

verifyDataRows(result, rows(4, "M"), rows(3, "F"));

If the goal is to verify that reverse correctly flips the post-aggregation sort, consider switching to verifyDataRowsInOrder(...) and tightening the comment; otherwise, adjust the comment to say you’re only validating the result set, not its ordering.

This is test quality, not functionality, so can be deferred but would strengthen coverage of the reverse behavior.


498-515: testTimechartWithGroupByAndReverse comment claims reversed order, but the assertion is unordered

The test description:

// Test timechart with group by and reverse
// The sort is on both @timestamp and the group by field
...
// All events are in the same hour, so only one time bucket
// Hosts are grouped and results are reversed

suggests you expect a specific row ordering after reverse, but the assertion uses verifyDataRows(...), which doesn’t check order:

verifyDataRows(
    result,
    rows("2024-07-01 00:00:00", "db-01", 1),
    rows("2024-07-01 00:00:00", "web-01", 2),
    rows("2024-07-01 00:00:00", "web-02", 2));

Given timechart appends an explicit ORDER BY and reverse adds another, the final ordering should be deterministic here. To actually exercise that, you could:

  • Use verifyDataRowsInOrder(...) with the expected reversed order, or
  • Relax the comment to just say you’re validating the grouped result content.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8977a8f and 22426ba.

📒 Files selected for processing (3)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteReverseCommandIT.java (6 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java (3 hunks)
🔇 Additional comments (2)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLStreamstatsTest.java (1)

222-250: New testStreamstatsWithReverse expectations look consistent with non-null-bucket semantics

The logical and Spark SQL plans correctly model reverse as an added DESC sort on __stream_seq__ on top of the existing ASC sort from streamstats, and the aggregation uses the same non-null-bucket MAX window shape as testStreamstatsBy. Looks good.

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLTimechartTest.java (1)

53-78: Events table schema and timechart+reverse tests are coherent

  • The added created_at TIMESTAMP column is wired consistently: row literals, EventsTable protoRowType, and the new timefield=created_at test all agree on column ordering and types.
  • testTimechartWithReverse and testTimechartWithCustomTimefieldAndReverse correctly expect:
    • Inner ASC sort on the appropriate bucket (SPAN(@timestamp, ...) or SPAN(created_at, ...)) from timechart.
    • An outer DESC sort from reverse, with NULLS flipped to NULLS FIRST.
      This aligns with the PR’s “tier 1” behavior for reversing existing collations.

Also applies to: 370-425, 437-445

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 2 weeks with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Support reverse pushdown with Calcite

6 participants