Push down filters on nested fields as nested queries #4825

yuancu · 2025-11-19T07:38:11Z

Description

Because nested fields are indexed as hidden documents, we cannot query them directly. Instead, we have to use the nested query to access them.

For example, for the following mapping:

{
  "mappings": {
    "properties": {
      "id": {"type": "keyword"},
      "items": {
        "type": "nested",
        "properties": {
          "name": {"type": "keyword"}
        }
      }
    }
  }
}

if we want to use term query to match those items with name banana, instead of using:

{
  "query": {
    "term": {
      "items.name": {
        "value": "banana",
        "boost": 1
      }
    }
  }
}

We should use

{
  "query": {
    "nested": {
      "path": "items",
      "query": {
        "term": {
          "items.name": {
            "value": "banana",
            "boost": 1
          }
        }
      }
    }
  }
}

Here, the nested clause steps down into the nested items field. It no longer has access to fields in the root document, nor fields in any other nested document.

Work items

Fix the case in the issue [BUG] PPL filter on computed fields from nested paths returns zero results #4508: script's access to a single nested field.
Support querying deep nested objects: if a field of a nested object is another nested object, we should construct a nested query using the deepest level
Support nested query in other types of filter.
- Term query
- Range query
- Multi-terms query
Support filters on both nested and root objects, where they can be separated. E.g. | where items.name = 'banana' and id = 2 can be separated to two filters: items.name = 'banana' stays under a nested query, while id = 2 stays on a normal term query, combined with a bool query. Already implemented.
Support filters on nested and root objects, where they can not be separated

Related Issues

Resolves #4508

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

New Features
- Nested field filtering now supports computed fields and multiple filter conditions
- Cascaded nested field hierarchies are fully supported in queries
- Enhanced query optimization for filters and aggregations on nested fields
Improvements
- Better handling of complex nested structures in query execution and pushdown optimization

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

…scripts Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

qianheng-aws · 2025-11-21T09:36:11Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java

+                nestedPaths));
+      }
+      if (!nestedPaths.isEmpty()) {
+        return nestedQuery(nestedPaths.get(0), scriptQuery, ScoreMode.None);


Since we have supported Agg script and Sort script now, I think we need to identify whether it's a filter script before wrapping it with nested query. And please add a test on agg script with nested fields.

Script sort does not make use of PredicateAnalyzer.QueryExpression, thus does not interleave with the current change:

sql/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java

Line 499 in c31227c

() -> SortBuilders.scriptSort(scriptExpr.getScript(), sortType).order(order));

The same is with field sort

sql/opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java

Line 476 in c31227c

() -> SortBuilders.fieldSort(digest.getFieldName()).order(order).missing(missing));

However, they still need nested queries to access nested fields. As a result, the following queries does not work (author is a nested field)

script sort: source=opensearch-sql_test_index_cascaded_nested | eval lower_name = lower(author.name) | sort lower_name

field sort: source=opensearch-sql_test_index_cascaded_nested | sort author.name

Should I fix them in this PR or another one?

For agg script, I added a test case. Yet I doubt I still miss many cases? Originally I thought they were only handled in AggregateAnalyzer, so I could raise another PR for nested fields in aggregations.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

…to issues/4508

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

coderabbitai · 2025-11-27T02:23:47Z

Walkthrough

The changes implement comprehensive support for filtering on nested fields in OpenSearch SQL queries. New test cases validate filtering scenarios including computed nested fields, combined nested and root field filters, and cascaded nested hierarchies. Core predicate analysis logic is enhanced to detect nested field paths and wrap filters in proper OpenSearch nested query DSL, while test indices and expected output fixtures are added to support these validation scenarios.

Changes

Cohort / File(s)	Summary
Test Implementation `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java`, `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java`	Add five new test methods for nested field filtering: testFilterOnComputedNestedFields, testFilterOnNestedAndRootFields, testFilterOnNestedFields, testFilterOnMultipleCascadedNestedFields, testAggFilterOnNestedFields. CalciteWhereCommandIT also adds testScriptFilterOnDifferentNestedHierarchyShouldThrow. Both update init() to load NESTED_SIMPLE, DEEP_NESTED, and CASCADED_NESTED indices.
Test Infrastructure & Constants `integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java`, `integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java`	Add CASCADED_NESTED enum constant to Index with mapping and data files; introduce TEST_INDEX_CASCADED_NESTED string constant.
Test Data & Mapping `integ-test/src/test/resources/cascaded_nested.json`, `integ-test/src/test/resources/indexDefinitions/cascaded_nested_index_mapping.json`	Introduce cascaded nested test dataset (three author documents with nested books and reviews) and corresponding ES mapping defining nested author, books (nested), and reviews (nested) structures.
Expected Output — Calcite Pushdown `integ-test/src/test/resources/expectedOutput/calcite/filter_computed_nested.yaml`, `filter_nested_term.yaml`, `filter_nested_terms.yaml`, `filter_root_and_nested.yaml`, `filter_multiple_nested_cascaded_range.yaml`, `agg_filter_nested.yaml`	Add six YAML fixtures containing expected Calcite logical and physical plans for nested filter pushdown scenarios, including script queries, nested DSL wrapping, and aggregation pushdown contexts.
Expected Output — Calcite No-Pushdown `integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/filter_computed_nested.yaml`, `filter_nested_term.yaml`, `filter_nested_terms.yaml`, `filter_root_and_nested.yaml`, `filter_multiple_nested_cascaded_range.yaml`, `agg_filter_nested.yaml`	Add six YAML fixtures containing expected Calcite logical and physical plans for nested filter scenarios without pushdown optimization, showing EnumerableCalc and EnumerableAggregate operators.
REST API Integration Test `integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4508.yml`	Add REST API test suite validating nested field evaluation with LENGTH() compute, filtering on computed fields, and mixed nested/root-level field conditions.
Core Predicate Analysis `opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java`	Enhance NamedFieldExpression with nestedPath tracking; add resolveNestedPath() utility for nested field detection; update SimpleQueryExpression and ScriptQueryExpression builders to wrap filters with nestedQuery DSL when nested paths are detected; add referredFields and fieldTypes tracking to ScriptQueryExpression.
Index Scan Filter Pushdown `opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java`	Modify pushDownFilter fieldTypes calculation to include all index field types instead of filtering to current schema fields only.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Calcite
    participant PredicateAnalyzer
    participant FilterBuilder
    participant OpenSearch as OpenSearch Query DSL

    Note over Calcite,OpenSearch: Filter on Nested Field Processing

    Calcite->>PredicateAnalyzer: analyzeFilter(field: "items.name", operator, value)
    PredicateAnalyzer->>PredicateAnalyzer: resolveNestedPath("items.name", fieldTypes)
    Note over PredicateAnalyzer: Detect nested path: "items"
    
    alt Simple Filter (term/range)
        PredicateAnalyzer->>FilterBuilder: buildSimpleQuery(condition)
        Note over FilterBuilder: Create term/range query on items.name
        FilterBuilder->>FilterBuilder: wrapInNestedQuery(query, path="items")
        FilterBuilder-->>OpenSearch: {nested: {path: "items", query: {...}}}
    else Script Filter (computed field)
        PredicateAnalyzer->>FilterBuilder: buildScriptQuery(expression: "LENGTH(items.name) > 5")
        Note over FilterBuilder: Create script expression<br/>tracking referredFields
        FilterBuilder->>FilterBuilder: Detect single nested path from referredFields
        FilterBuilder->>FilterBuilder: wrapInNestedQuery(scriptQuery, path="items")
        FilterBuilder-->>OpenSearch: {nested: {path: "items", query: {script: {...}}}}
    else Multiple Nested Paths
        PredicateAnalyzer->>PredicateAnalyzer: Detect paths: ["items", "metadata"]
        PredicateAnalyzer-->>PredicateAnalyzer: throw UnsupportedScriptException
    end

    OpenSearch->>OpenSearch: Evaluate nested query with proper semantics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

PredicateAnalyzer.java: Core nested path resolution and query wrapping logic requires careful validation for correctness across simple filters, script queries, and aggregations; nested path detection and multiple-path error handling logic must be thoroughly reviewed
Cascaded nested test scenarios: Multi-level nested hierarchies (author → books → reviews) with filtering across different levels require validation that nested DSL wrapping applies correct path semantics
Script query nested wrapping: Ensure referredFields tracking correctly identifies all nested field dependencies and that script wrapping logic handles edge cases (e.g., mixing nested and root fields)
Expected output YAML fixtures: Verify physical plan pushdown contexts accurately reflect nested query DSL structure, particularly complex scenarios like computed filters and aggregation filters over nested fields

Poem

🐰 In nested fields we dig so deep,
Where reviews sleep and books they keep,
Now scripts don't fail when paths are far,
DSL wraps queries like a star! ⭐
Cascaded hierarchies at last,
Filtering shadows of the past!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: pushing down filters on nested fields as nested queries, which aligns with the PR's core objective to fix filtering on nested fields.
Linked Issues check	✅ Passed	The PR implements the core objective from #4508: generating proper OpenSearch nested query DSL instead of script queries when expressions reference nested fields, addressing the bug where filters on computed nested fields returned zero results.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the PR objectives: adding test coverage for nested field filtering, implementing nested query generation in PredicateAnalyzer, and extending field type handling in CalciteLogicalIndexScan for proper nested path resolution.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4508.yml (1)

28-42: Consider adding edge case tests for nested field filtering completeness.

While the core scenarios are well covered, the test suite could benefit from additional edge cases to improve robustness:

Empty or null nested fields: What happens when items is missing, empty, or contains null values?

Multiple nested values per document: Documents with multiple items (e.g., items: [{"name": "apple"}, {"name": "banana"}]) to validate flattening and filtering behavior.

Deep nesting levels: Test cascaded nested objects (if the PR supports this per the objectives).

These tests would validate boundary conditions and ensure the fix handles all nested field scenarios correctly.

Also applies to: 44-58

integ-test/src/test/resources/expectedOutput/calcite/filter_multiple_nested_cascaded_range.yaml (1)

8-8: Consider formatting for long-line readability in YAML test fixtures.

Line 8 contains the entire physical plan and OpenSearchRequestBuilder on a single line, making it difficult to inspect and maintain. For complex nested structures, consider formatting the embedded JSON with newlines or splitting across multiple lines if the YAML parser supports it.

This is a minor maintainability concern and does not affect test correctness.
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (1)
1848-1867: Consider adding null check for name parameter.

The method checks for null/empty fieldTypes but not for null name. If name is null, line 1852 (name.contains(".")) will throw a NullPointerException.

Looking at the callers, name could potentially be null when derived from schema lookups that return null for out-of-bounds indices (e.g., line 1614). Consider adding a null check:
 private static String resolveNestedPath(String name, Map<String, ExprType> fieldTypes) {
-  if (fieldTypes == null || fieldTypes.isEmpty()) {
+  if (name == null || fieldTypes == null || fieldTypes.isEmpty()) {
     return "";
   }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b88bf56 and d2b94ea.

📒 Files selected for processing (21)

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (4 hunks)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java (1 hunks)
integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1 hunks)
integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java (1 hunks)
integ-test/src/test/resources/cascaded_nested.json (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/agg_filter_nested.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/filter_computed_nested.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/filter_multiple_nested_cascaded_range.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/filter_nested_term.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/filter_nested_terms.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite/filter_root_and_nested.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/agg_filter_nested.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/filter_computed_nested.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/filter_multiple_nested_cascaded_range.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/filter_nested_term.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/filter_nested_terms.yaml (1 hunks)
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/filter_root_and_nested.yaml (1 hunks)
integ-test/src/test/resources/indexDefinitions/cascaded_nested_index_mapping.json (1 hunks)
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4508.yml (1 hunks)
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (10 hunks)
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java (1)

integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java (1)

TestsConstants (9-102)

opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java (2)

core/src/main/java/org/opensearch/sql/calcite/type/ExprIPType.java (1)

ExprIPType (12-16)

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)

OpenSearchTypeFactory (63-409)

🪛 Biome (2.1.2)

integ-test/src/test/resources/cascaded_nested.json

[error] 1-2: End of file expected