[BUG] `top` and other bucket-dependent aggregations don't work on nested fields

**What is the bug?**
Sibling issue of https://github.com/opensearch-project/OpenSearch/issues/20218: `top` and other aggregations won't work in cases where a field has `"type": "nested"`. If this is intended behavior, we'll have to patch in a workaround. If it's not intended, we can see about waiting on an upstream fix.

**How can one reproduce the bug?**
If a field has `"type": "nested"`, any command that compiles to a bucket aggregation will fail to return data.

Data & Mapping:

```json5
% cat test_map.json | jq
{
  "mappings": {
    "properties": {
      "rec": {
        "type": "nested",
        "properties": {
          "inner": {
            "type": "integer"
          }
        }
      }
    }
  }
}

% cat test_data.json
{"create": {"_index": "test"}}
{"rec":{"inner": 100}}
{"create": {"_index": "test"}}
{"rec":{"inner": 101}}
{"create": {"_index": "test"}}
{"rec":{"inner": 101}}
```

Queries that depend on composite buckets:

```json5
opensearchsql> source=test | top rec.inner;
fetched rows / total rows = 1/1
+-----------+-------+
| rec.inner | count |
|-----------+-------|
| null      | 3     |
+-----------+-------+
opensearchsql> source=test | stats count() by rec.inner;
fetched rows / total rows = 1/1
+---------+-----------+
| count() | rec.inner |
|---------+-----------|
| 3       | null      |
+---------+-----------+
> source=test | dedup rec.inner;
fetched rows / total rows = 0/0
+-----+
| rec |
|-----|
+-----+
```


Explained:
```json5
opensearchsql> explain source=test | top rec.inner;
= Calcite Plan =
== Logical ==
LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(rec.inner=[$0], count=[$1])
    LogicalFilter(condition=[<=($2, 10)])
      LogicalProject(rec.inner=[$0], count=[$1], _row_number_=[ROW_NUMBER() OVER (ORDER BY $1 DESC)])
        LogicalAggregate(group=[{0}], count=[COUNT()])
          LogicalProject(rec.inner=[$1])
            CalciteLogicalIndexScan(table=[[OpenSearch, test]])

== Physical ==
EnumerableLimit(fetch=[10000])
  EnumerableCalc(expr#0..2=[{inputs}], expr#3=[10], expr#4=[<=($t2, $t3)], proj#0..1=[{exprs}], $condition=[$t4])
    EnumerableWindow(window#0=[window(order by [1 DESC] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
      CalciteEnumerableIndexScan(table=[[OpenSearch, test]], PushDownContext=[[AGGREGATION->rel#597:LogicalAggregate.NONE.[](input=RelSubset#596,group={0},count=COUNT())], OpenSearchRequestBuilder(sourceBuilder={
        "from": 0,
        "size": 0,
        "timeout": "1m",
        "aggregations": {
          "composite_buckets": {
            "composite": {
              "size": 1000,
              "sources": [
                {
                  "rec.inner": {
                    "terms": {
                      "field": "rec.inner",
                      "missing_bucket": true,
                      "missing_order": "first",
                      "order": "asc"
                    }
                  }
                }
              ]
            }
          }
        }
      }, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])

opensearchsql> explain source=test | stats count() by rec.inner;
= Calcite Plan =
== Logical ==
LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(count()=[$1], rec.inner=[$0])
    LogicalAggregate(group=[{0}], count()=[COUNT()])
      LogicalProject(rec.inner=[$1])
        CalciteLogicalIndexScan(table=[[OpenSearch, test]])

== Physical ==
CalciteEnumerableIndexScan(table=[[OpenSearch, test]], PushDownContext=[[AGGREGATION->rel#1164:LogicalAggregate.NONE.[](input=RelSubset#1163,group={0},count()=COUNT()), PROJECT->[count(), rec.inner], LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={
  "from": 0,
  "size": 0,
  "timeout": "1m",
  "aggregations": {
    "composite_buckets": {
      "composite": {
        "size": 10000,
        "sources": [
          {
            "rec.inner": {
              "terms": {
                "field": "rec.inner",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc"
              }
            }
          }
        ]
      }
    }
  }
}, requestedTotalSize=10000, pageSize=null, startFrom=0)])

opensearchsql> explain source=test | dedup rec.inner;
= Calcite Plan =
== Logical ==
LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])
  LogicalProject(rec=[$0])
    LogicalFilter(condition=[<=($8, 1)])
      LogicalProject(rec=[$0], rec.inner=[$1], _id=[$2], _index=[$3], _score=[$4], _maxscore=[$5], _sort=[$6], _routing=[$7], _row_number_dedup_=[ROW_NUMBER() OVER (PARTITION BY $1 ORDER BY $1)])
        LogicalFilter(condition=[IS NOT NULL($1)])
          CalciteLogicalIndexScan(table=[[OpenSearch, test]])

== Physical ==
CalciteEnumerableIndexScan(table=[[OpenSearch, test]], PushDownContext=[[PROJECT->[rec, rec.inner], AGGREGATION->rel#953:LogicalAggregate.NONE.[](input=LogicalProject#951,group={0},agg#0=LITERAL_AGG(1)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={
  "from": 0,
  "size": 0,
  "timeout": "1m",
  "_source": {
    "includes": [
      "rec",
      "rec.inner"
    ],
    "excludes": []
  },
  "aggregations": {
    "composite_buckets": {
      "composite": {
        "size": 10000,
        "sources": [
          {
            "rec.inner": {
              "terms": {
                "field": "rec.inner",
                "missing_bucket": false,
                "order": "asc"
              }
            }
          }
        ]
      },
      "aggregations": {
        "$f1": {
          "top_hits": {
            "from": 0,
            "size": 1,
            "version": false,
            "seq_no_primary_term": false,
            "explain": false,
            "_source": {
              "includes": [
                "rec.inner",
                "rec"
              ],
              "excludes": []
            },
            "script_fields": {}
          }
        }
      }
    }
  }
}, requestedTotalSize=10000, pageSize=null, startFrom=0)])
```

**What is the expected behavior?**
```json5
opensearchsql> source=test | top rec.inner;
fetched rows / total rows = 2/2
+-----------+-------+
| rec.inner | count |
|-----------+-------|
| 101       | 2     |
| 100       | 1     |
+-----------+-------+
```

**What is your host/environment?**
 - OS: Linux 5.10
 - Version: mainline, 3.3.2
 - Plugins: Yours truly

**Do you have any screenshots?**
N/A

**Do you have any additional context?**
N/A


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] `top` and other bucket-dependent aggregations don't work on nested fields #4949

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] top and other bucket-dependent aggregations don't work on nested fields #4949

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[BUG] `top` and other bucket-dependent aggregations don't work on nested fields #4949