Skip to content

Conversation

@yuancu
Copy link
Collaborator

@yuancu yuancu commented Dec 2, 2025

Description

Previously, we have implemented our own type checking mechanism due to a lacking of SQL validation phase, which relies on SqlNode, and compatibility issues with user-defined types.

With this PR, I try to solve these problems and migrate to Calcite's type checking mechanism by introducing a SqlNode layer. This is the Approach 1 described in #3865

It has the following advantages:

  • A full-frame type checking mechanism. We used to support only chains of family operand type checking. Calcite's built-in type checkers additionally supports variable-length, same-operand, or-relationship type checkers.
  • Omitting unnecessary function implementations by re-writing the SqlNode. E.g. we can replace a call to sqrt(x) by replacing it with pow(x, 0.5) in the SqlNode layer.
  • Automatic type casting when necessary

Please feel free to ask why do you change this and why does that no longer match during review!

Work items:

  • Re-introduce SqlNode: rel node -> sql node -validation&coercion-> sql node -> rel node
  • Enable type casting
  • Support UDT: Calcite's type checking relies on enum SqlTypeName, therefore there is no space for UDT there. I did the trick by replacing the operand types of a function call before type checking and restore the types after type checking. E.g. EXPR_TIME -> SqlTypeName.TIME (1090/1599 ITs passed)
  • UDT compatibility in a deeper level: we converted UDT to sql type to use clacite's type checking system (e.g. SqlTypeName.TIMESTAMP). However, when the return type is related to the argument type, this will lead the return type to be sql type instead of UDTs, leaving to an incorrect plan and validation failures. We need to find a way that only use UDT at the top level, using SQL types in intermediate steps.
  • Cross type comparison: comparison between different datetime types (e.g. date v.s. type) is not supported in SQL:
    SELECT * FROM EMPS WHERE JOINEDAT=TIME(12,0,0);
    Error: Cannot apply '=' to arguments of type '<DATE> = <TIME(0)>'. Supported form(s): '<COMPARABLE_TYPE> = <COMPARABLE_TYPE>' (state=,code=0)
    We have to replicate Extend comparison methods to accept different datetime types. (#129) #1196 to manually add conversion.
    Fixed by overriding commonTypeForBinaryComparison (1240/1599 ITs passed)
  • There is no IP type in calcite. For the purpose of type checking, I may have to use an existing sql type to fake it.
  • Support overloads of operators
    • Overload IP comparison: convert call to ip comparator when converting sql node to rel node if either operand is IP.
    • ATAN function has a one-operator and a two-operator version: define a new operator that rewrites sql call conditionally
    • ADD can be string concatenation and number addition - fixed by resolve call conditionally in FuncImpTable
  • Null direction is not preserved in SQL node
  • Rewritten sort orders are not preserved
  • ANTI and SEMI joins are not supported
  • IT Fixes
    • CalcitePPLDateTimeBuiltinFunctionIT
      • semantics of interval mismatch in sql and ppl
    • CalcitePPLBuiltinFunctionIT
      • LOG(a, b)
      • sarg deserialization bug (in testRand)
    • CalciteArrayFunctionIT
      • type checkers for all array functions are not defined
      • reduce function's type inference is incomplete -- it will derive return types in both SqlNode and RelNode level, ensuring that they are the same. The current implementation works only at the RelNode level
    • CalciteMathematicalFunctionIT
    • CalcitePPLAggregationIT
    • CalciteBinCommandIT -- Fixed by ignoring bin-on-timestamp as it is not implemented
      • Problem 1: Cannot apply '-' to arguments of type '<TIMESTAMP(0)> - <TIMESTAMP(0)>'. Supported form(s): ' - '\n'<DATETIME_INTERVAL> - <DATETIME_INTERVAL>'\n' - <DATETIME_INTERVAL>'" in IT testStatsWithBinsOnTimeAndTermField_Avg and testStatsWithBinsOnTimeAndTermField_Count
      • Problem 2: Windowed aggregate expression is illegal in GROUP BY clause in IT testStatsWithBinsOnTimeField_Count and testStatsWithBinsOnTimeField_Avg
    • CalciteStreamstatsCommandIT
      • Assisting columns in sort by are project
      • Does not handle bucket_nullable correctly
    • CalcitePPLJsonBuiltinFunctionIT: string-ify not correct
  • No longer tolerant malformatted string numbers: Fixed by replacing CAST with SAFE_CAST
  • Nullability attribute changes after SAFE_CAST so validation breaks: fixed by resetting nullability after coercion with SAFE_CAST
  • RelHint on LogicalAggregate is lost during rel node -> sql node -> rel node round-trip conversion: fixed by adding SqlHint after converting LogicalAggregation
  • Many push-down relies on specified sequences of logical operations. SqlNode -> RelNode changes the plan. Need to re-activate them.
  • Add unit & integration tests

Related Issues

Resolves #4636, resolves #3865

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-project opensearch-project deleted a comment from coderabbitai bot Dec 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced type coercion system for improved implicit type conversion in queries
    • Extended support for Spark SQL dialect compatibility
    • Improved IP type handling and validation
  • Bug Fixes

    • Fixed type casting issues in filter and aggregation operations
    • Corrected timestamp and datetime type handling in expressions
    • Resolved operand type validation for various SQL functions
  • Documentation

    • Updated expected query plan outputs to reflect improved type handling

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Introduces a Calcite-backed validation and implicit type coercion pipeline for PPL by converting RelNode↔SqlNode around validation. Adds PPL-specific validator, coercion rules, converters/shuttles, and convertlets. Refactors type utilities/factories, operand metadata, and function registry. Updates numerous tests (JSON→YAML expectations) and docs. Adjusts window collation embedding and arithmetic/type derivation.

Changes

Cohort / File(s) Change Summary / Review Focus
Validation & Coercion Pipeline
core/.../QueryService.java, core/.../calcite/CalcitePlanContext.java, core/.../validate/PplValidator.java, core/.../validate/PplTypeCoercion.java, core/.../validate/PplTypeCoercionRule.java, core/.../validate/converters/PplRelToSqlNodeConverter.java, core/.../validate/converters/PplSqlToRelConverter.java, core/.../validate/shuttles/SqlRewriteShuttle.java, core/.../validate/shuttles/SkipRelValidationShuttle.java, core/.../validate/shuttles/PplRelToSqlRelShuttle.java, core/.../validate/OpenSearchSparkSqlDialect.java, core/.../validate/SqlOperatorTableProvider.java, core/.../validate/ValidationUtils.java
Adds end-to-end Rel→Sql validation with implicit casts; new validator/coercion rule sets; converters for SEMI/ANTI joins; shuttles for rewrites and literal normalization; dialect tweaks (liberal mode, IP cast). Review control flow, error handling (tolerant exceptions), and join handling.
Convertlets
core/.../validate/PplConvertletTable.java
Custom convertlets; IP-aware comparisons and ATAN remap. Validate coverage and operator mapping precedence.
Type Utilities & Factory
core/.../utils/OpenSearchTypeUtil.java, core/.../utils/OpenSearchTypeFactory.java, core/.../calcite/ExtendedRexBuilder.java, core/.../calcite/utils/CalciteToolsHelper.java
New utility for UDT/numeric/datetime checks; leastRestrictive override; arithmetic type derivation for mixed string/number; error chaining for WIDTH_BUCKET. Verify UDT conversions and nullability propagation.
Rel/Rex Visitors
core/.../calcite/CalciteRelNodeVisitor.java, core/.../calcite/CalciteRexNodeVisitor.java
Embed existing collations into window OVER clauses; simplify window function building. Check ordering semantics and planner determinism.
Function Registry & Operand Metadata Refactor
core/.../expression/function/PPLFuncImpTable.java, core/.../expression/function/UDFOperandMetadata.java, core/.../expression/function/PPLBuiltinOperators.java, core/.../calcite/utils/PPLOperandTypes.java
Switch to single-implementation registry (no per-signature lists); unify on OperandTypes and custom checkers; many operators converted to SqlFunction. Validate registration, arity/rules, and backward compatibility.
Removed Legacy Type Checking/Coercion
core/.../expression/function/PPLTypeChecker.java (removed), core/.../expression/function/CoercionUtils.java (removed), core/.../expression/function/CalciteFuncSignature.java (removed), core/.../test/.../CoercionUtilsTest.java (removed)
Eliminate bespoke type-coercion/signature layer in favor of Calcite validation. Confirm no remaining dependencies.
UDF Implementations (Collections)
core/.../CollectionUDF/*
Define operand metadata (ARRAY/MAP/FUNCTION/VARIADIC), rework type inference (e.g., Reduce/Transform). Verify bindings on SqlCallBinding and optional operands.
UDF Implementations (IP)
core/.../udf/ip/*
Custom operand checkers for IP functions (IP, CIDRMATCH, comparisons). Ensure OpenSearchTypeUtil.isIp coverage including OTHER acceptance where applicable.
Width Bucket/Binning
core/.../utils/binning/BinnableField.java, core/.../udf/binning/WidthBucketFunction.java
Switch to OpenSearchTypeUtil checks; return-type inference adjusted. Confirm timestamp/date handling.
API/Test Adjustments
api/.../UnifiedQueryTranspilerTest.java
Update dialect import and expected SQL (SAFE_CAST→equality). Aligns with new coercion.
Integration Tests (Java)
integ-test/src/test/java/...
New/updated tests (e.g., mixed-type BETWEEN/IN), YAML explain adoption, expected exception type changes, conditional skips. Review new flows relying on SAFE_CAST and YAML comparators.
Integration Tests (Expected YAML)
integ-test/src/test/resources/expectedOutput/calcite/*
Wide updates from JSON→YAML, plan reshaping (SAFE_CAST, EXPR_TIMESTAMP/DATE/TIME, reordered nodes, map literal formatting). Validate against new validator/coercion semantics.
Docs
docs/user/ppl/*
Example updates/temporarily ignored blocks; adjusted outputs reflecting new casting and formatting. Confirm accuracy with current behavior.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant QueryService
  participant RelBuilder as PPL RelNode
  participant RelToSql as PplRelToSqlNodeConverter
  participant Validator as PplValidator/TypeCoercion
  participant SqlToRel as PplSqlToRelConverter
  participant Optimizer as Planner/Rules
  participant OpenSearch as DataStore

  Client->>QueryService: executeWithCalcite/ explainWithCalcite(PPL)
  QueryService->>RelBuilder: build initial RelNode
  QueryService->>RelBuilder: apply PplRelToSqlRelShuttle (normalize literals)
  QueryService->>RelToSql: convert RelNode -> SqlNode
  QueryService->>Validator: validate SqlNode (implicit casts, UDT mapping)
  alt Validation tolerant failure
    Validator-->>QueryService: throws known issue
    QueryService-->>QueryService: fallback to original RelNode
  else Success
    Validator-->>QueryService: validated SqlNode
    QueryService->>SqlToRel: SqlNode -> RelNode (Spark SEMI/ANTI fixups)
  end
  QueryService->>Optimizer: optimize RelNode
  Optimizer->>OpenSearch: pushdown (PROJECT/FILTER/AGG/SORT)
  OpenSearch-->>QueryService: results/plan
  QueryService-->>Client: response
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

backport 2.19-dev

Suggested reviewers

  • penghuo
  • LantaoJin
  • anirudha
  • ps48
  • ykmr1224
  • dai-chen
  • kavithacm
  • derek-ho

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.81% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Out of Scope Changes check ❓ Inconclusive Most changes focus on the Calcite migration (validators, coercion rules, converters, type utilities, operand metadata, and test updates), but several large public-API removals and registry/signature refactors are present. Confirm intent for breaking API removals/signature changes (e.g., PPLTypeChecker, CoercionUtils, CalciteFuncSignature, widespread SqlOperator→SqlFunction and registry shape changes); if intentional, document as breaking changes or split into a compatibility/major-version PR and add migration notes.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title concisely and accurately describes the primary change: switching to Calcite's validation system for type checking and coercion.
Description check ✅ Passed Description is detailed and directly related to the changeset, listing goals, approach, work items, and known limitations.
Linked Issues check ✅ Passed Implementation aligns with linked issues: reintroduces SqlNode validation, PplValidator, PplTypeCoercion/Rule, converters, and UDT handling to leverage Calcite coercion.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuancu yuancu added the enhancement New feature or request label Dec 3, 2025
@yuancu yuancu force-pushed the issues/4636 branch 5 times, most recently from e085f81 to fc6dd27 Compare December 15, 2025 13:40
…checking

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java

# Conflicts:
#	core/src/main/java/org/opensearch/sql/executor/QueryService.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… logics

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- 2 more ITs passed in PPLBuiltinFunctionIT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type
- todo: update the toRex in ExtendedRelJson to the align with the latest version

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…estamp; (time, timestamp) -> timestamp (1240/1599)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…2/1872)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- allow type cast
- rewrite call to sql compare to custom ip comapre

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…1356/1599 | 1476/1915)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…d in mvindex's implementation (1580/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…iting (1579/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…pe hint

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…e inference (1701/2015)

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu yuancu changed the title Use Calcite's type checkers for type validation & coercion Use Calcite's validation system for type checking & coercion Dec 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
docs/user/ppl/functions/collection.md (1)

165-189: Multiple collection function examples disabled—add user guidance and timeline.

Five examples across TRANSFORM and MVMAP sections (lambda-based functions) are disabled pending issue #4972. These appear to be core use cases:

  • TRANSFORM: Both single-param (x → x + 2) and two-param ((x, i) → x + i) examples disabled
  • MVMAP: All three examples disabled, including the nested-expression case

This leaves users with no working documentation examples for these functions. For now, consider adding a single prominent note at the beginning of each disabled section (rather than per-example) explaining the limitation and expected resolution timeline, or provide workaround patterns if feasible.

Also applies to: 817-863

docs/user/ppl/cmd/bin.md (1)

453-463: Expand context for disabled binning example with strings.

Example 20 references issue #4973 but lacks explanation of the limitation. Consider adding a brief note describing why bin with string fields is unsupported and suggesting an alternative if one exists, so users understand the constraint rather than assuming a documentation error.

integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java (1)

40-758: Some YAML resource files are missing.

ExplainIT.java references 58 YAML files, but only 56 exist in integ-test/src/test/resources/expectedOutput/ppl/. The missing files are:

  • explain_filter_function_script_push.yaml
  • explain_filter_script_push.yaml

Both are referenced in test methods marked with @Ignore("The serialized string is unstable because of function properties"), so these tests won't execute. Either create the YAML files for completeness when re-enabling these tests, or remove the loadExpectedPlan() calls from the ignored test methods.

core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java (1)

21-21: Add proper JavaDoc to public class and methods.

The coding guidelines require all public classes and methods to have proper JavaDoc. Please add:

  • Class-level JavaDoc explaining the purpose of RelevanceQueryFunction
  • JavaDoc for getReturnTypeInference() with @return tag
  • Convert the comment block (lines 32-38) to proper JavaDoc format for getOperandMetadata() with @return tag

As per coding guidelines, "All public classes and methods must have proper JavaDoc" and "Public methods MUST have JavaDoc with @param, @return, and @throws."

📝 Example JavaDoc structure
+/**
+ * User-defined function for relevance query operations.
+ * This function is only supported when pushed down to the backend.
+ */
 public class RelevanceQueryFunction extends ImplementorUDF {
 
   public RelevanceQueryFunction() {
     super(new RelevanceQueryImplementor(), NullPolicy.ANY);
   }
 
+  /**
+   * Returns the type inference for this function.
+   * 
+   * @return BOOLEAN type inference
+   */
   @Override
   public SqlReturnTypeInference getReturnTypeInference() {
     return ReturnTypes.BOOLEAN;
   }
 
-  /*
-   * The first parameter is always required (either fields or query).
+  /**
+   * Defines the operand metadata for relevance query functions.
+   * 
+   * The first parameter is always required (either fields or query).
    * The second parameter is query when fields are present, otherwise it's the first parameter.
    * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries.
    * Different query has different parameter set, which will be validated in dedicated query builder.
    * Query parameter is always required and cannot be null.
+   * 
+   * @return operand metadata accepting 1-25 MAP-type parameters
    */
   @Override
   public UDFOperandMetadata getOperandMetadata() {

Also applies to: 28-30, 40-45

core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java (1)

466-490: Add ranking window functions to WINDOW_FUNC_MAPPING to support ROW_NUMBER, RANK, and DENSE_RANK in Calcite path.

The current implementation of visitWindowFunction validates window functions against BuiltinFunctionName.WINDOW_FUNC_MAPPING, which only includes aggregate window functions (MAX, MIN, AVG, COUNT, SUM, etc.). Ranking window functions like ROW_NUMBER, RANK, and DENSE_RANK are defined as enum constants in BuiltinFunctionName but are missing from WINDOW_FUNC_MAPPING. This causes these functions to fail with UnsupportedOperationException when used through the Calcite path, even though PlanUtils.makeOver has handling for ROW_NUMBER.

Add the missing ranking window functions to WINDOW_FUNC_MAPPING:

  • row_numberROW_NUMBER
  • rankRANK
  • dense_rankDENSE_RANK

Verify that PplValidator or another upstream component does not already handle this validation separately before adding to avoid duplication.

core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java (1)

243-247: Enhance JavaDoc with parameter and return value documentation.

The method has @Override annotation, making public visibility necessary to match the parent class CalcitePrepareImpl#getPreparingStmt. Per coding guidelines, add @param tags for context, elementType, catalogReader, planner and @return tag documenting the CalcitePreparingStmt object returned.

🧹 Nitpick comments (60)
core/src/main/java/org/opensearch/sql/expression/function/udf/math/DivideFunction.java (1)

45-48: Consider adding JavaDoc for consistency with coding standards.

While the method's purpose is clear, adding JavaDoc would align with the project's coding guidelines that require documentation for public methods.

📝 Suggested JavaDoc
+  /**
+   * Returns the SQL kind for this operator.
+   *
+   * @return SqlKind.DIVIDE indicating this is a division operation
+   */
   @Override
   public SqlKind getKind() {
     return SqlKind.DIVIDE;
   }
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java (4)

59-65: Consider adding braces for single-line conditionals.

The single-line if statements without braces reduce readability and can lead to maintenance issues.

Suggested improvement
         if (function1 instanceof SqlCall call) {
           List<SqlNode> operands = call.getOperandList();
           // The first argument is base (accumulator), while the second is from the array
-          if (!operands.isEmpty()) validator.setValidatedNodeType(operands.get(0), baseType);
-          if (operands.size() > 1 && elementType != null)
-            validator.setValidatedNodeType(operands.get(1), elementType);
+          if (!operands.isEmpty()) {
+            validator.setValidatedNodeType(operands.get(0), baseType);
+          }
+          if (operands.size() > 1 && elementType != null) {
+            validator.setValidatedNodeType(operands.get(1), elementType);
+          }
         }

71-74: Same style issue: add braces for the single-line conditional.

Suggested improvement
           if (function2 instanceof SqlCall call) {
             List<SqlNode> operands = call.getOperandList();
-            if (!operands.isEmpty()) validator.setValidatedNodeType(operands.get(0), returnType);
+            if (!operands.isEmpty()) {
+              validator.setValidatedNodeType(operands.get(0), returnType);
+            }
           }

46-84: Missing JavaDoc for public method.

As per coding guidelines, public methods should have JavaDoc with @return documentation. This method has complex type inference logic that would benefit from documentation explaining the different binding paths.

Suggested JavaDoc
+  /**
+   * Returns type inference logic for the reduce function.
+   * <p>
+   * For {@link RexCallBinding}, returns the last operand's type directly.
+   * For {@link SqlCallBinding}, derives the return type by manipulating validator's
+   * validated node types to properly infer lambda return types.
+   *
+   * @return the SQL return type inference function
+   */
   @Override
   public SqlReturnTypeInference getReturnTypeInference() {

109-139: Consider improving type safety in eval method.

While unchanged, this method uses raw types and unchecked casts. The @SuppressWarnings("unchecked") annotation would document intentional unchecked operations, improving maintainability.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java (1)

24-33: Consider expanding test coverage for mixed-type scenarios.

The test validates the core behavior of mixed-type IN predicates with Calcite. To strengthen test coverage and align with testing guidelines, consider adding test cases for:

  • NULL values in the IN list
  • Empty IN list
  • Different numeric type combinations (long, double with string)
  • All-string or all-numeric IN lists for comparison
  • Boundary conditions (single value, many values)

Based on learnings, testing SQL generation and optimization paths for Calcite integration changes helps ensure correctness.

Example additional test cases
@Test
public void testInWithNullInList() throws IOException {
  // Verify NULL handling in IN predicate
  JSONObject result = executeQuery(
      String.format("source=%s | where balance in (4180, null, '6077') | fields firstname",
          TEST_INDEX_ACCOUNT));
  // Add appropriate assertions
}

@Test
public void testInWithAllStringTypes() throws IOException {
  // Verify all-string IN predicate still works
  JSONObject result = executeQuery(
      String.format("source=%s | where balance in ('4180', '5686', '6077') | fields firstname",
          TEST_INDEX_ACCOUNT));
  // Add appropriate assertions
}

@Test
public void testInWithMixedNumericTypes() throws IOException {
  // Verify different numeric types coerce properly
  JSONObject result = executeQuery(
      String.format("source=%s | where balance in (4180, 5686.0, '6077') | fields firstname",
          TEST_INDEX_ACCOUNT));
  // Add appropriate assertions
}
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.yaml (1)

1-15: LGTM! Comprehensive test case for aggregation and limit pushdown.

This expected output YAML effectively tests a complex query plan involving:

  • Multiple limit levels (system limit, pagination, inner fetch)
  • Offset handling for pagination
  • Aggregation pushdown to OpenSearch with composite buckets
  • Edge cases like missing value handling (missing_bucket: true, missing_order: first)

The logical and physical plans are consistent, and the PushDownContext correctly reflects the optimizations (AGGREGATION and LIMIT pushdowns). The test data is realistic and representative of production query patterns.


Note: Line 15 is quite long (~800+ characters) due to the serialized PushDownContext with embedded JSON. While this makes manual reading challenging, it's appropriate for test fixtures where exact string matching is required.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (6)

6-6: Package naming convention issue.

The package name CollectionUDF uses PascalCase, which violates Java naming conventions. Package names should be all lowercase (e.g., collectionudf or collection.udf). This appears to be a pre-existing issue, but it's worth noting for future refactoring.


84-102: Unreachable switch cases.

The ANY and IGNORE cases in this switch statement are unreachable because the families list on line 66-67 only contains ARRAY and FUNCTION. Consider removing this dead code to improve clarity, or document why it's preserved (e.g., for future extensibility).


62-150: Missing JavaDoc for the anonymous class methods.

Per coding guidelines, public methods should have JavaDoc with @param, @return, and @throws. The methods in this anonymous SqlSingleOperandTypeChecker implementation lack documentation. While the class-level comment references FamilyOperandTypeChecker, adding brief documentation for checkOperandTypes, getOperandCountRange, and getAllowedSignatures would improve maintainability.


166-167: Unchecked cast without null/type validation.

The cast (List<Object>) args[0] could throw a ClassCastException if args[0] is not a List. Consider adding a defensive check or documenting the contract that guarantees args[0] is always a List.

🔎 Proposed defensive check
  public static Object eval(Object... args) {
-   List<Object> target = (List<Object>) args[0];
+   if (!(args[0] instanceof List)) {
+     throw new IllegalArgumentException("First argument must be a List");
+   }
+   @SuppressWarnings("unchecked")
+   List<Object> target = (List<Object>) args[0];
    List<Object> results = new ArrayList<>();

185-187: Broad exception handling loses context.

Catching generic Exception and re-throwing as RuntimeException discards the original exception's type information. Consider catching more specific exceptions or preserving the exception chain with a descriptive message.

🔎 Example improvement
        } catch (Exception e) {
-         throw new RuntimeException(e);
+         throw new RuntimeException("Failed to apply lambda function to array element", e);
        }

Also applies to: 215-217


46-49: Missing JavaDoc on public methods.

Per coding guidelines, public methods should have JavaDoc with @param, @return, and @throws. The methods getReturnTypeInference(), getOperandMetadata(), and eval() are public but lack documentation. Consider adding JavaDoc to improve maintainability.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml (1)

1-10: Test data covers basic range query; verify edge cases in companion tests.

The test data uses realistic timestamp values ('2023-01-01' and '2023-01-03') for a basic range filter scenario. As per coding guidelines, ensure companion test files cover edge cases and boundary conditions such as:

  • Null/missing timestamp values
  • Timezone handling (the generated query uses UTC 'Z' timezone)
  • Boundary inclusivity (>= vs >, < vs <=)
  • Invalid/malformed timestamp strings (especially relevant given PR's CAST → SAFE_CAST changes)
  • Single-point equality vs range queries
integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java (3)

154-159: Complete the migration to YAML-based assertions.

This test still uses JSON format (explain_count_agg_push.json and assertJsonEqualsIgnoreId) while most other tests have been migrated to YAML. For consistency with the PR's migration objectives, consider completing the migration to YAML format.

🔎 Suggested migration to YAML
  @Test
  public void testCountAggPushDownExplain() throws IOException {
-    String expected = loadExpectedPlan("explain_count_agg_push.json");
-    assertJsonEqualsIgnoreId(
+    String expected = loadExpectedPlan("explain_count_agg_push.yaml");
+    assertYamlEqualsIgnoreId(
        expected,
-        explainQueryToString("source=opensearch-sql_test_index_account | stats count() as cnt"));
+        explainQueryYaml("source=opensearch-sql_test_index_account | stats count() as cnt"));
  }

Note: This assumes the corresponding YAML resource file exists or will be created.


192-198: Complete the migration to YAML-based assertions.

This test still uses JSON format while most other tests have been migrated to YAML. For consistency, consider migrating this test to YAML format as well.

🔎 Suggested migration to YAML
  @Test
  public void testSortWithTypePushDownExplain() throws IOException {
-    String expected = loadExpectedPlan("explain_sort_type_push.json");
-    assertJsonEqualsIgnoreId(
+    String expected = loadExpectedPlan("explain_sort_type_push.yaml");
+    assertYamlEqualsIgnoreId(
        expected,
-        explainQueryToString(
+        explainQueryYaml(
            "source=opensearch-sql_test_index_account | sort num(age) | fields age"));
  }

Note: This assumes the corresponding YAML resource file exists or will be created.


201-213: Complete the migration to YAML-based assertions.

This test still uses JSON format while most other tests have been migrated to YAML. For consistency with the broader migration in this PR, consider migrating this test to YAML format.

🔎 Suggested migration to YAML
  @Test
  public void testSortWithAggregationExplain() throws IOException {
    // Sorts whose by fields are aggregators should not be pushed down
-    String expected = loadExpectedPlan("explain_sort_agg_push.json");
-    assertJsonEqualsIgnoreId(
+    String expected = loadExpectedPlan("explain_sort_agg_push.yaml");
+    assertYamlEqualsIgnoreId(
        expected,
-        explainQueryToString(
+        explainQueryYaml(
            "source=opensearch-sql_test_index_account"
                + "| stats avg(age) AS avg_age by state, city "
                + "| sort avg_age"));

    // sorts whose by fields are not aggregators can be pushed down.
    // This test is covered in testExplain
  }

Note: This assumes the corresponding YAML resource file exists or will be created.

integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_sort_push.yaml (1)

7-7: Consider maintainability of explicit field listings.

The expanded LogicalProject explicitly lists all 17 fields. While this provides full visibility for test validation, consider whether:

  1. This verbosity is necessary for the test's purpose (verifying limit-then-sort pushdown)
  2. Future schema changes to the test index would require updating this expected output

If this explicit listing is required by the new Calcite-based plan generation (as opposed to a more compact wildcard or implicit projection), this is acceptable. The field list is realistic and the indices ($0-$16) are correctly maintained through the projection chain.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml (1)

1-8: Consider edge case coverage for timestamp-string comparison tests.

This expected output file tests a basic timestamp range query with exclusive bounds. Per coding guidelines for test resources, verify that related test files cover important edge cases:

  • Inclusive boundary conditions (>=, <=)
  • Exact boundary values (timestamps matching filter bounds)
  • Null timestamp values
  • Malformed or invalid timestamp strings
  • Timezone variations and DST transitions
  • Timestamp precision edge cases (milliseconds, microseconds, nanoseconds)

While this file appropriately represents the expected output for its specific scenario, comprehensive edge case coverage is important for the new Calcite-based type coercion system introduced in this PR.

As per coding guidelines, ensure test data covers edge cases and boundary conditions.

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml (2)

8-8: Verify performance implications of FILTER→SCRIPT pushdown change.

The physical plan changed from FILTER-><>($0, 0) to SCRIPT-><>(SAFE_CAST($0), 0), shifting from a native OpenSearch filter to script-based evaluation. OpenSearch scripts typically have higher overhead than native filters and may not benefit from index optimizations.

Please confirm:

  1. Is this change intentional and necessary for the type coercion migration?
  2. Have performance benchmarks been run comparing FILTER vs SCRIPT approaches for simple predicates?
  3. Are there plans to optimize back to native filters for cases where SAFE_CAST is unnecessary?

Based on learnings, test SQL generation and optimization paths for Calcite integration changes.


1-8: Consider adding test coverage for SAFE_CAST edge cases.

This test validates the happy path where SAFE_CAST succeeds, but doesn't cover edge cases such as:

  • SAFE_CAST returning NULL when type conversion fails
  • How NULL values are handled in the <> comparison
  • Malformed or out-of-range values that trigger safe cast failures

Since this PR introduces a new type coercion framework with SAFE_CAST, consider adding complementary test cases that validate error handling and null semantics.

Would you like me to help identify other clickbench queries that could be extended to cover these edge cases, or open an issue to track comprehensive SAFE_CAST test coverage?

integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml (1)

11-11: Physical plan expectation is comprehensive but potentially brittle.

The physical plan includes detailed script serialization with SAFE_CAST operations that correctly mirror the logical plan. While this provides thorough validation of the Calcite-to-OpenSearch translation path, the embedded base64-encoded script and detailed request builder structure make this expectation sensitive to:

  • Script serialization format changes
  • Internal representation tweaks
  • Field ordering or naming adjustments

Consider whether a more focused assertion on the key elements (SAFE_CAST presence, EXPR_TIMESTAMP usage, field mappings) might be more maintainable while still validating the critical behavior introduced by this PR.

Alternatively, if comprehensive end-to-end validation is the goal, ensure that test failure messages clearly indicate which part of the expectation mismatched (logical vs physical, script content vs request structure) to aid debugging.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml (1)

1-8: Consider adding test cases for SAFE_CAST edge cases.

This test expectation validates the explain plan structure with SAFE_CAST, but uses well-formed integer literals (32, 39225). Per the PR objectives stating "replace CAST with SAFE_CAST for malformed strings," consider whether additional test cases should cover:

  • Malformed or invalid string inputs that SAFE_CAST should handle gracefully
  • Null value handling through SAFE_CAST
  • Type mismatches (e.g., comparing numeric fields to non-numeric literals)
  • Boundary conditions (overflow, underflow, out-of-range values)

This would ensure the test validates actual safe casting behavior, not just plan structure.

integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml (1)

5-5: Test data is realistic and TIME() function behavior is correct; add simpler test cases for edge case coverage.

The TIME function is a standard Calcite function with multiple overloads, and the test correctly validates its behavior. EXPR_TIME VARCHAR is a valid OpenSearch user-defined type (UDT) that maps TIME to VARCHAR, and the physical plan correctly shows time extraction: Sarg[('12:00:00.123456789':EXPR_TIME VARCHAR..'19:00:00.123456789':EXPR_TIME VARCHAR)].

The test data uses OpenSearch's "hh:mm:ss a" date format (12-hour with AM/PM) and tests nanosecond precision, which is realistic and covers a boundary condition. However, to fully satisfy the coding guideline of ensuring edge case and boundary condition coverage, add simpler test cases with time-only literals (e.g., TIME('12:00:00'), TIME('00:00:00'), TIME('23:59:59')) to validate basic time comparison functionality.

core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java (1)

32-38: Consider updating the comment to reflect the simplified validation.

The comment block describes the parameter structure in detail, but the new validation simply checks that all parameters are MAP types. While the comment provides useful context about parameter semantics and notes that detailed validation happens in the "dedicated query builder," consider adding a note that the type-level validation here only enforces MAP types for all parameters, with detailed parameter structure validation delegated to the query builder.

integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_5_push.yaml (1)

1-10: Test case correctly represents the expected plan structure.

This expected output file properly covers:

  • ✓ Nested LIMIT operations (10 and 5) with proper Calcite representation using LogicalSort(fetch=...)
  • ✓ Projection chain testing (all fields → age only)
  • ✓ Push-down optimization validation in the physical plan
  • ✓ Correct YAML format matching the schema (calcite → logical/physical structure)
  • ✓ Realistic field references for an OpenSearch account index

The test data appears realistic and the format is correct. The physical plan demonstrates proper optimization by pushing down both the projection (PROJECT->[age]) and the effective limit (LIMIT->5).

Optional: Consider whether the broader test suite covers additional edge cases like LIMIT 0, or cases where the inner limit exceeds the outer limit, though this is beyond the scope of this single test file.

integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml (1)

4-6: Logical plan correctly reflects simplified projection structure.

The removal of the top-level projection and adjustment of field indices ($8→$9 for email, $9→$10 for lastname) aligns with the PR's goal to simplify Calcite logical plans. The SPAN_BUCKET operation with a fixed width of 10 is correctly represented.

Consider adding complementary test cases for edge scenarios:

  • NULL or missing age values
  • Age values at bucket boundaries (e.g., exactly 0, 10, 20)
  • Zero or negative span parameters (expected to fail validation)
  • Empty result sets

These would provide more comprehensive coverage of the binning functionality under the new Calcite type-checking framework.

integ-test/src/test/resources/expectedOutput/calcite/explain_bin_minspan.yaml (1)

1-12: MINSPAN_BUCKET plan correctly implements dynamic binning with window aggregations.

The explain plan accurately represents the MINSPAN_BUCKET operation, which requires window functions to compute MIN/MAX boundaries before bucketing. The physical plan correctly sequences operations: full scan for window aggregation → limit → bucket calculation → final limit.

The requestedTotalSize=2147483647 (Integer.MAX_VALUE) is appropriate for window functions that need access to all rows before applying the fetch=5 limit. Consider adding test coverage for edge cases:

  • All age values identical (MIN($8) = MAX($8))
  • NULL age values
  • Single-row result sets
  • min_span parameter larger than actual data range
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java (1)

24-24: Add class-level JavaDoc.

The public class lacks class-level JavaDoc. Per coding guidelines, all public classes must have proper JavaDoc documentation explaining the class's purpose, when to use it, and how it fits into the broader Calcite integration.

Suggested JavaDoc structure
+/**
+ * Custom SqlToRelConverter for PPL that handles Spark SQL-specific join types.
+ *
+ * <p>This converter extends Calcite's standard {@link SqlToRelConverter} to support
+ * LEFT SEMI JOIN and LEFT ANTI JOIN semantics, which are not natively supported by
+ * Calcite's default join type conversion. The converter temporarily transforms these
+ * join types to LEFT JOIN for Calcite processing, then corrects the join type in the
+ * resulting RelNode tree.
+ *
+ * @see PplRelToSqlConverter for the corresponding Rel-to-SQL conversion
+ */
 public class PplSqlToRelConverter extends SqlToRelConverter {
integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml (1)

1-40: Expected Calcite plan output is correct; test covers primary null-handling scenario.

The expected output file accurately represents the Calcite logical and physical execution plans for the chart query with null string handling. The test query source=%s | chart limit=10 usenull=true nullstr='nil' avg(balance) over gender by age span=10 does test null handling through:

  • IS NULL checks in the CASE expression
  • SAFE_CAST for type conversions
  • SPAN with null parameter (null:ANY)
  • Boundary coverage at age=30 (span=10 with data ages 28-36)
  • Multiple null/missing values in test data (age, balance, gender fields)

However, test data could be expanded to cover additional edge cases: empty strings, non-numeric string values, and more span boundary conditions (exactly 0, negative values, larger boundaries). The current data is representative for the primary null-handling scenario but limited for comprehensive boundary testing.

core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java (1)

40-43: Add JavaDoc for public method.

The new getKind() method is public but lacks JavaDoc documentation. Per coding guidelines, all public methods should have proper JavaDoc with @return documentation.

📝 Suggested JavaDoc addition
+  /**
+   * Returns the SQL kind for this function to enable Calcite dialect integration.
+   *
+   * @return SqlKind.MOD to identify this function as a modulo operation
+   */
   @Override
   public SqlKind getKind() {
     return SqlKind.MOD;
   }
core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java (1)

149-163: Constructor placement after methods is unconventional.

The constructor on lines 149-163 appears after the visit methods (lines 127-147). Consider moving the constructor to appear after the field declarations (after line 96) for better readability, following standard Java class structure conventions.

core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java (2)

21-21: Missing class-level JavaDoc.

Per coding guidelines, all public classes should have proper JavaDoc. Add documentation explaining the shuttle's purpose (schema qualification stripping, COUNT normalization, tuple IN/NOT_IN ROW-wrapping).

Suggested JavaDoc
+/**
+ * A SqlShuttle that rewrites SQL nodes during validation to ensure compatibility
+ * with Calcite's type derivation and SQL-to-Rel conversion.
+ *
+ * <p>Rewrites performed:
+ * <ul>
+ *   <li>Strips OpenSearch schema qualifier from two-part identifiers
+ *   <li>Converts COUNT() to COUNT(*) for proper type derivation
+ *   <li>Wraps SqlNodeList in ROW for tuple IN/NOT_IN predicates
+ * </ul>
+ */
 public class SqlRewriteShuttle extends SqlShuttle {

62-65: Mutating input SqlCall may cause unintended side effects.

The setOperand(0, ...) call on line 64 mutates the original SqlCall object before passing it to super.visit(). If the same SqlCall instance is shared or visited multiple times, this mutation could cause unexpected behavior.

Consider creating a new SqlBasicCall with the modified operand instead:

🔎 Proposed fix
       if (!call.getOperandList().isEmpty()
           && call.getOperandList().get(0) instanceof SqlNodeList nodes) {
-        call.setOperand(0, SqlStdOperatorTable.ROW.createCall(nodes));
+        List<SqlNode> newOperands = new ArrayList<>(call.getOperandList());
+        newOperands.set(0, SqlStdOperatorTable.ROW.createCall(nodes));
+        call = call.getOperator().createCall(
+            call.getFunctionQuantifier(),
+            call.getParserPosition(),
+            newOperands.toArray(new SqlNode[0]));
       }
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java (2)

140-171: toSqlHint lacks null-safety for hint parameter.

The toSqlHint method checks hint.kvOptions and hint.listOptions for null but doesn't guard against a null hint parameter itself. If a null hint is passed (e.g., from a stream with null elements), this will throw NPE.

🔎 Suggested fix
   private static SqlHint toSqlHint(RelHint hint, SqlParserPos pos) {
+    if (hint == null) {
+      throw new IllegalArgumentException("RelHint cannot be null");
+    }
     if (hint.kvOptions != null) {

125-129: Consider using Collectors.toList() for immutable result.

The hint collection uses Collectors.toCollection(ArrayList::new) but SqlNodeList.of() doesn't require mutability. Using Collectors.toList() (or toList() in Java 16+) would be more idiomatic.

       List<SqlNode> hints =
           e.getHints().stream()
               .map(relHint -> (SqlNode) toSqlHint(relHint, POS))
-              .collect(Collectors.toCollection(ArrayList::new));
+              .collect(Collectors.toList());
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ForallFunctionImpl.java (1)

54-68: Consider adding defensive checks in eval() method.

The eval() method performs unchecked casts that could fail at runtime:

  • Line 57: (List<Object>) args[0] - NPE if args[0] is null, ClassCastException if not a List
  • Line 60: (Boolean) lambdaFunction.apply(candidate) - ClassCastException if lambda returns non-Boolean

While operand type checking should prevent most misuse, adding defensive null checks would improve robustness.

🔎 Suggested defensive checks
   public static Object eval(Object... args) {
+    if (args == null || args.length < 2 || args[0] == null || args[1] == null) {
+      return null;
+    }
     org.apache.calcite.linq4j.function.Function1 lambdaFunction =
         (org.apache.calcite.linq4j.function.Function1) args[1];
     List<Object> target = (List<Object>) args[0];
     try {
       for (Object candidate : target) {
-        if (!(Boolean) lambdaFunction.apply(candidate)) {
+        Object result = lambdaFunction.apply(candidate);
+        if (!(result instanceof Boolean) || !(Boolean) result) {
           return false;
         }
       }
     } catch (Exception e) {
       throw new RuntimeException(e);
     }
     return true;
   }
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonKeysFunctionImpl.java (1)

36-38: Add JavaDoc for public method.

The operand metadata change correctly enables Calcite type validation with OperandTypes.ANY, which is appropriate for a function accepting any JSON input. However, per coding guidelines, all public methods must have JavaDoc with @return documentation.

📋 Suggested JavaDoc
+  /**
+   * Returns the operand metadata for the JSON_KEYS function.
+   *
+   * @return UDFOperandMetadata accepting any operand type
+   */
   @Override
   public UDFOperandMetadata getOperandMetadata() {
     return UDFOperandMetadata.wrap(OperandTypes.ANY);
   }
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMaxFunction.java (1)

40-42: Add JavaDoc for public method.

The operand metadata change correctly uses OperandTypes.VARIADIC for the MAX function's variable-length argument list, properly enabling Calcite type validation. However, per coding guidelines, all public methods must have JavaDoc with @return documentation.

📋 Suggested JavaDoc
+  /**
+   * Returns the operand metadata for the SCALAR_MAX function.
+   *
+   * @return UDFOperandMetadata accepting variadic operands
+   */
   @Override
   public UDFOperandMetadata getOperandMetadata() {
     return UDFOperandMetadata.wrap(OperandTypes.VARIADIC);
   }
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.java (1)

38-41: Consider adding JavaDoc for public method.

Per coding guidelines, public methods should have JavaDoc documentation. Since the PR objectives list "documentation/javadocs" as outstanding work, this can be addressed in that effort.

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java (1)

43-46: Consider adding JavaDoc for public method.

While this method overrides a parent declaration, adding JavaDoc with an @return description would improve code documentation and maintainability, explaining that the method returns variadic operand metadata to support 2+ argument validation.

📝 Suggested JavaDoc addition
   @Override
+  /**
+   * Returns operand metadata for json_extract function.
+   *
+   * @return Variadic operand metadata allowing 2 or more arguments
+   */
   public UDFOperandMetadata getOperandMetadata() {
     return UDFOperandMetadata.wrap(OperandTypes.VARIADIC);
   }

As per coding guidelines, public methods should have proper JavaDoc documentation.

core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java (1)

19-19: LGTM! Centralized type checking improves maintainability.

Migrating from a local dateRelatedType() method to OpenSearchTypeUtil.isDatetime() centralizes type checking logic and reduces code duplication. This aligns with the PR's goal of standardizing type handling through Calcite's RelDataType framework.

Also applies to: 53-57

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java (1)

23-23: LGTM! Explicit operand metadata improves validation.

Replacing the null return with UDFOperandMetadata.wrap(OperandTypes.ONE_OR_MORE) enables proper operand type checking and validation for the JSON_DELETE function. This aligns with similar changes across other JSON UDF implementations in the PR.

Consider adding a unit test to verify the ONE_OR_MORE operand metadata behavior, especially for edge cases like:

  • Single path argument (minimum case)
  • Multiple path arguments
  • Invalid operand counts (if validation is enforced)

As per coding guidelines, new business logic changes should have corresponding unit tests.

Also applies to: 40-40

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)

1809-1809: Track TODO for chart pushdown regression.

This TODO indicates that the chart with limit/useother options may have regressed from optimal pushdown behavior. Consider creating an issue to track this for future resolution.

Would you like me to help create an issue to track this pushdown optimization?

integ-test/src/test/java/org/opensearch/sql/calcite/clickbench/PPLClickBenchIT.java (1)

68-71: Consider adding a comment explaining why q41 requires special handling.

The code adds a conditional ignore for q41 with the comment "Ignore q41 as it needs special handling". Consider expanding this comment to briefly explain what makes q41 special (e.g., "requires alternative plan comparison" or "non-deterministic plan generation").

🔎 Suggested comment improvement
     if (isCalciteEnabled()) {
-      // Ignore q41 as it needs special handling
+      // Ignore q41 as it requires separate test with alternative plan comparison
       ignored.add(41);
     }
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)

458-465: Consider adding schema verification for consistency.

The test validates data rows but doesn't verify the schema, unlike the adjacent testBetweenWithIncompatibleTypes which calls verifySchema. For consistency with other BETWEEN tests in this file, consider adding schema verification.

🔎 Suggested enhancement
   @Test
   public void testBetweenWithMixedTypes() throws IOException {
     JSONObject actual =
         executeQuery(
             String.format(
                 "source=%s | where age between '35' and 38 | fields firstname, age",
                 TEST_INDEX_BANK));
+    verifySchema(actual, schema("firstname", "string"), schema("age", "int"));
     verifyDataRows(actual, rows("Hattie", 36), rows("Elinor", 36));
   }
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)

137-140: Misleading error message: context already exists when this check runs.

The error message states the provider must be set "before creating CalcitePlanContext," but this check occurs inside getValidator() on an already-created instance. Consider rewording for accuracy.

🔎 Suggested fix
       if (operatorTableProvider == null) {
         throw new IllegalStateException(
-            "SqlOperatorTableProvider must be set before creating CalcitePlanContext");
+            "SqlOperatorTableProvider must be set before calling getValidator()");
       }
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java (1)

206-268: Consider adding JavaDoc for public inner class and methods.

Per coding guidelines, public classes and methods should have proper JavaDoc. The OpenSearchPrepareImpl class and its public perform and getPreparingStmt methods would benefit from documentation describing their purpose, parameters, and return values, especially since getPreparingStmt is now part of the public API surface.

core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)

63-74: Add JavaDoc for the public getCastSpec override.

This method handles a Calcite-specific workaround for IP UDT types. Per coding guidelines, public methods should have JavaDoc with @param and @return. The inline comments are helpful but a method-level JavaDoc would improve discoverability.

Suggested JavaDoc
+  /**
+   * Returns the cast specification for the given type.
+   *
+   * <p>For IP UDT types, returns a special {@link SqlDataTypeSpec} using {@link
+   * SqlAlienSystemTypeNameSpec} since IP borrows {@code SqlTypeName.OTHER} internally.
+   *
+   * @param type the target data type for the cast
+   * @return the cast specification SqlNode, or null if not applicable
+   */
   @Override
   public @Nullable SqlNode getCastSpec(RelDataType type) {
core/src/main/java/org/opensearch/sql/executor/QueryService.java (2)

313-328: Consider logging when tolerant validation is triggered.

The tolerant validation path at lines 324-326 silently returns the original relNode when ValidationUtils.tolerantValidationException(e) returns true. Adding a debug/trace log would help diagnose validation bypass scenarios during debugging.

Suggested enhancement
     } catch (CalciteContextException e) {
       if (ValidationUtils.tolerantValidationException(e)) {
+        log.debug("Tolerant validation triggered, bypassing validation: {}", e.getMessage());
         return relNode;
       }
       throw new ExpressionEvaluationException(e.getMessage(), e);
     }

349-352: Validate the round-trip conversion properly restores RelNode structure.

The final conversion back through PplRelToSqlRelShuttle with false parameter (line 351) should reverse the interval literal transformations applied in line 311. The asymmetric boolean parameter usage (true then false) could benefit from named constants or an enum for clarity.

Consider using named constants for clarity
+  private static final boolean FIX_INTERVALS_FOR_SQL = true;
+  private static final boolean RESTORE_INTERVALS_FROM_SQL = false;
+
   private RelNode validate(RelNode relNode, CalcitePlanContext context) {
     // ...
-    RelNode sqlRelNode = relNode.accept(new PplRelToSqlRelShuttle(context.rexBuilder, true));
+    RelNode sqlRelNode = relNode.accept(new PplRelToSqlRelShuttle(context.rexBuilder, FIX_INTERVALS_FOR_SQL));
     // ...
-    return validatedRel.accept(new PplRelToSqlRelShuttle(context.rexBuilder, false));
+    return validatedRel.accept(new PplRelToSqlRelShuttle(context.rexBuilder, RESTORE_INTERVALS_FROM_SQL));
   }
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)

56-66: Consider adding inline comments for IP type representation.

The use of SqlTypeName.OTHER to represent IP types may not be immediately obvious to future maintainers. A brief inline comment would improve clarity.

🔎 Suggested improvement
   private static final Map<SqlTypeName, ImmutableSet<@NonNull SqlTypeName>> additionalMapping =
       Map.of(
+          // SqlTypeName.OTHER represents IP and other user-defined types
           SqlTypeName.OTHER,
           ImmutableSet.of(SqlTypeName.OTHER, SqlTypeName.VARCHAR, SqlTypeName.CHAR),
           SqlTypeName.VARCHAR,
           ImmutableSet.<SqlTypeName>builder()
               .add(SqlTypeName.OTHER)
               .addAll(SqlTypeName.NUMERIC_TYPES)
               .build(),
           SqlTypeName.CHAR,
           ImmutableSet.of(SqlTypeName.OTHER));
core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java (2)

26-26: Make INSTANCE field final for singleton safety.

The singleton instance should be final to prevent accidental reassignment.

🔎 Proposed fix
-  public static PplConvertletTable INSTANCE = new PplConvertletTable();
+  public static final PplConvertletTable INSTANCE = new PplConvertletTable();

25-27: Add class-level JavaDoc to document purpose.

Per coding guidelines, public classes should have proper JavaDoc. Consider documenting the purpose of this convertlet table and when it's used in the PPL validation flow.

🔎 Proposed fix
+/**
+ * PPL-specific convertlet table that customizes SQL-to-Rex conversion for PPL operators.
+ *
+ * <p>This table provides custom conversion logic for:
+ * <ul>
+ *   <li>IP-type comparisons: Routes to PPL IP comparison operators when operands are IP types</li>
+ *   <li>ATAN function: Rewrites PPL ATAN to standard Calcite ATAN</li>
+ * </ul>
+ */
 public class PplConvertletTable extends ReflectiveConvertletTable {
+  /** Singleton instance for use across the validation/conversion flow. */
   public static PplConvertletTable INSTANCE = new PplConvertletTable();
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)

332-386: Consider extracting type counting logic for readability.

The method has 8 counter variables and multiple conditionals. Consider extracting the counting loop into a helper class or record (e.g., TypeCounts) to improve readability and testability.

🔎 Example structure
private record TypeCounts(int nullCount, int anyCount, int nullableCount, 
                          int dateCount, int timeCount, int ipCount, 
                          int binaryCount, int otherCount) {
  static TypeCounts from(List<RelDataType> types) {
    // counting logic here
  }
}
core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java (1)

50-56: Document thread-safety implications of mutable 'top' field.

The top field is mutable instance state used to track recursive deriveType calls. If the same PplValidator instance is used concurrently (which is unlikely but possible), this could cause incorrect type derivation. Consider adding a comment noting that validator instances should not be shared across threads.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)

41-55: Consider refining the JavaDoc for clarity.

The method name is isNumericOrCharacter, but the JavaDoc mentions "OpenSearch UDT numeric types" without mentioning that UDT character types are not included. The logic only checks ExprCoreType.numberTypes() for UDTs, meaning UDT character types would return false - which may be intentional but should be documented.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (3)

407-416: Consider stricter handling of duplicate function registrations.

The method now logs a warning when a function is registered multiple times (lines 411-412) and silently overwrites the previous registration. While the warning provides visibility, this approach could mask configuration errors.

Consider:

  1. Throwing an exception for duplicate registrations to fail-fast on misconfiguration
  2. Using ERROR level instead of WARN if duplicates should never occur in production
  3. Documenting if duplicate registration is an expected pattern (e.g., hot-reload scenarios)
Optional: Stricter duplicate handling
     if (externalFunctionRegistry.containsKey(functionName)) {
-      logger.warn(
-          String.format(Locale.ROOT, "Function %s is registered multiple times", functionName));
+      throw new IllegalStateException(
+          String.format(Locale.ROOT, "Function %s is already registered", functionName));
     }

425-438: Apply consistent duplicate handling for aggregate functions.

Similar to registerExternalOperator, this method warns on duplicates (lines 427-430) but continues with overwrite. Apply the same duplicate handling strategy consistently across both registration methods.


648-684: Document Calcite workarounds for composite type checkers.

Several operator registrations include TODO comments about workarounds for "sequence CompositeOperandTypeChecker" (lines 650, 656, 666, 679). These manual .or() chains replace Calcite's standard composite checkers.

Ensure these workarounds are:

  1. Documented in code with explanation of why manual chaining is needed
  2. Tracked as technical debt if Calcite's standard behavior should eventually be used
  3. Tested to confirm the manual type checking produces equivalent behavior

As per coding guidelines: Document Calcite-specific workarounds in code.

Comment on lines 129 to 136
public SqlValidator getValidator() {
if (validator == null) {
final CalciteServerStatement statement;
try {
statement = connection.createStatement().unwrap(CalciteServerStatement.class);
} catch (SQLException e) {
throw new RuntimeException(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential resource leak: Statement is never closed.

The CalciteServerStatement created on line 133 is never closed. While it's used to provide context for the validator, it should either be stored for later cleanup or handled with try-with-resources if only needed temporarily for initialization.

🔎 Suggested fix

If the statement must remain open for the validator's lifecycle, store it as a field and close it when the context is disposed. Alternatively, if only needed for initialization:

+  /** Statement backing the validator (kept open for validator lifecycle). */
+  private CalciteServerStatement validatorStatement;
+
   public SqlValidator getValidator() {
     if (validator == null) {
-      final CalciteServerStatement statement;
       try {
-        statement = connection.createStatement().unwrap(CalciteServerStatement.class);
+        validatorStatement = connection.createStatement().unwrap(CalciteServerStatement.class);
       } catch (SQLException e) {
         throw new RuntimeException(e);
       }

Also consider implementing AutoCloseable on CalcitePlanContext to properly clean up resources.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129-136, the CalciteServerStatement created via
connection.createStatement().unwrap(...) is never closed which can leak
resources; either store this statement in a new private field and close it when
CalcitePlanContext is disposed (add a close() method or implement AutoCloseable
and close the field there), or if the statement is only needed for validator
initialization wrap its creation/use in a try-with-resources so it is closed
immediately after use; update class lifecycle to ensure the statement is closed
on context shutdown.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CalciteServerStatement does not implement AutoCloseable (although the statement created with connection.createStatement() does). Does this mean that I do not have to close it?

Comment on lines +35 to +91
/**
* Override to support Spark SQL's LEFT ANTI JOIN and LEFT SEMI JOIN conversion to RelNode.
*
* <p>The default implementation in {@link SqlToRelConverter#convertJoinType} does not expect
* LEFT_ANTI_JOIN and LEFT_SEMI_JOIN. This override works around the limitation by first
* temporarily changing LEFT_ANTI_JOIN/LEFT_SEMI_JOIN to LEFT join in the SqlJoin node, then
* calling {@code super.convertFrom()} to perform normal conversion, finally substituting the join
* type in the resulting RelNode to ANTI/SEMI.
*
* @param bb Scope within which to resolve identifiers
* @param from FROM clause of a query.
* @param fieldNames Field aliases, usually come from AS clause, or null
*/
@Override
protected void convertFrom(
Blackboard bb, @Nullable SqlNode from, @Nullable List<String> fieldNames) {
JoinType originalJoinType = null;
if (from instanceof SqlJoin join) {
JoinType joinType = join.getJoinType();
if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) {
join.setOperand(2, JoinType.LEFT.symbol(from.getParserPosition()));
originalJoinType = joinType;
}
}
super.convertFrom(bb, from, fieldNames);
if (originalJoinType != null) {
RelNode root = bb.root();
if (root != null) {
JoinRelType correctJoinType =
originalJoinType == JoinType.LEFT_SEMI_JOIN ? JoinRelType.SEMI : JoinRelType.ANTI;
RelNode fixedRoot = modifyJoinType(root, correctJoinType);
bb.setRoot(fixedRoot, false);
}
}
}

private RelNode modifyJoinType(RelNode root, JoinRelType correctJoinType) {
if (root instanceof LogicalProject project) {
RelNode input = project.getInput();
RelNode fixedInput = modifyJoinType(input, correctJoinType);
if (fixedInput != input) {
return project.copy(
project.getTraitSet(), fixedInput, project.getProjects(), project.getRowType());
}
} else if (root instanceof LogicalJoin join) {
if (join.getJoinType() == JoinRelType.LEFT) {
return join.copy(
join.getTraitSet(),
join.getCondition(),
join.getLeft(),
join.getRight(),
correctJoinType,
join.isSemiJoinDone());
}
}
return root;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for test files related to PplSqlToRelConverter
echo "=== Searching for PplSqlToRelConverter test files ==="
find . -name "*Test*.java" -type f | xargs grep -l "PplSqlToRelConverter" 2>/dev/null

# Search for SEMI/ANTI join tests
echo -e "\n=== Searching for SEMI/ANTI join test references ==="
find . -name "*Test*.java" -type f | xargs grep -l "SEMI_JOIN\|ANTI_JOIN\|LEFT_SEMI\|LEFT_ANTI" 2>/dev/null | head -10

# Search for convertFrom method overrides in tests
echo -e "\n=== Searching for convertFrom tests ==="
find . -name "*Test*.java" -type f | xargs grep -l "convertFrom" 2>/dev/null

# List test files in the calcite validate directory
echo -e "\n=== Test files in calcite/validate directory ==="
find . -path "*/test/*" -path "*/calcite/validate*" -name "*Test*.java" 2>/dev/null | head -20

Repository: opensearch-project/sql

Length of output: 359


🏁 Script executed:

#!/bin/bash
# Find the PplSqlToRelConverter file itself
echo "=== Finding PplSqlToRelConverter class file ==="
find . -name "PplSqlToRelConverter.java" -type f

# Search for any calcite-related tests
echo -e "\n=== Calcite-related test files ==="
find . -path "*/test/*" -path "*calcite*" -name "*.java" | head -20

# Check what's in SQLQueryValidatorTest related to SEMI/ANTI
echo -e "\n=== Checking SQLQueryValidatorTest for join tests ==="
find . -name "SQLQueryValidatorTest.java" -exec grep -A 5 -B 5 "SEMI_JOIN\|ANTI_JOIN" {} \;

# Search for SqlToRelConverter usage in tests
echo -e "\n=== SqlToRelConverter in tests ==="
find . -name "*Test*.java" -type f | xargs grep -l "SqlToRelConverter\|convertJoinType" 2>/dev/null

# Check the parent directory structure
echo -e "\n=== Directory structure for calcite/validate ==="
find . -path "*/calcite/validate/*" -type d | head -10

Repository: opensearch-project/sql

Length of output: 4838


Add unit tests for PplSqlToRelConverter SEMI/ANTI join conversion.

Per coding guidelines for Calcite integration changes, comprehensive test coverage for SQL generation and optimization paths is required. This new converter lacks dedicated tests for its SEMI/ANTI join conversion logic. Add unit tests covering:

  • LEFT_SEMI_JOIN to SEMI conversion
  • LEFT_ANTI_JOIN to ANTI conversion
  • Edge cases (nested projects, multiple join levels)
🤖 Prompt for AI Agents
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java
lines 35-91: add unit tests that verify LEFT_SEMI_JOIN and LEFT_ANTI_JOIN are
converted to JoinRelType.SEMI and JoinRelType.ANTI respectively, including edge
cases (project-wrapped join and nested/multi-level joins). Create tests in the
corresponding test package (core/src/test/java/.../validate/converters) that
construct SqlJoin nodes with LEFT_SEMI_JOIN and LEFT_ANTI_JOIN, run through
PplSqlToRelConverter.convertFrom (or call the public entry point that uses it),
then assert on the resulting RelNode tree that the join nodes use SEMI/ANTI; for
project-wrapped cases ensure the converter drills through LogicalProject (assert
join type changed and project preserved), and for nested joins verify only the
target join(s) are modified. Include separate test methods with descriptive
names (e.g., convertLeftSemiJoinToSemi, convertLeftAntiJoinToAnti,
convertJoinThroughProject, convertNestedJoins) and cover both RelNode inspection
and round-trip SQL/rel equivalence where applicable.

Comment on lines +74 to +78

@Override
public SqlOperandCountRange getOperandCountRange() {
return getOperandMetadata().getOperandCountRange();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential NPE when getOperandMetadata() returns null.

The override delegates to getOperandMetadata().getOperandCountRange(), but getOperandMetadata() can return null (as seen in some UDF implementations before this PR's changes). If any UDF still returns null, this will cause a NullPointerException.

🔎 Proposed fix with null safety
       @Override
       public SqlOperandCountRange getOperandCountRange() {
-        return getOperandMetadata().getOperandCountRange();
+        UDFOperandMetadata metadata = getOperandMetadata();
+        return metadata != null ? metadata.getOperandCountRange() : super.getOperandCountRange();
       }
🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java
around lines 74-78, the method currently calls
getOperandMetadata().getOperandCountRange() which can NPE if
getOperandMetadata() returns null; guard by checking if getOperandMetadata() ==
null and handle that case explicitly — either return a safe default
SqlOperandCountRange or throw an IllegalStateException with a clear message
(including the UDF class/name via getClass().getSimpleName()) so callers fail
fast and the null is surfaced safely.

Comment on lines +217 to +218
<!-- TODO: Enbale after fixing https://github.com/opensearch-project/sql/issues/4968 -->
```ppl ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in TODO comment.

Line 217 contains "Enbale" which should be "Enable".

-<!-- TODO: Enbale after fixing https://github.com/opensearch-project/sql/issues/4968 -->
+<!-- TODO: Enable after fixing https://github.com/opensearch-project/sql/issues/4968 -->
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<!-- TODO: Enbale after fixing https://github.com/opensearch-project/sql/issues/4968 -->
```ppl ignore
<!-- TODO: Enable after fixing https://github.com/opensearch-project/sql/issues/4968 -->
🤖 Prompt for AI Agents
In docs/user/ppl/cmd/patterns.md around lines 217 to 218, fix the typo in the
TODO comment: change "Enbale" to "Enable" so the comment reads "Enable after
fixing https://github.com/opensearch-project/sql/issues/4968".

Comment on lines 4 to +5
LogicalAggregate(group=[{}], count(firstname)=[COUNT($0)], count(name)=[COUNT($1)])
LogicalProject(firstname=[$1], name=[$10])
LogicalProject(firstname=[$1], lastname=[$10])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the actual test file to verify content
if [ -f "integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push10.yaml" ]; then
  echo "=== File exists, reading lines 1-15 ==="
  head -15 integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push10.yaml
else
  echo "File not found, searching for similar files..."
  find . -name "*explain_count_agg*.yaml" -type f 2>/dev/null | head -5
fi

Repository: opensearch-project/sql

Length of output: 966


Update aggregation output field name to match projected field.

The LogicalAggregate at line 4 labels the output as count(name)=[COUNT($1)], but position $1 from the LogicalProject at line 5 is lastname, not name. This creates a schema mismatch where the output field name doesn't reflect the actual field being aggregated. Verify whether the aggregation label should be updated to count(lastname) to match the projection, or confirm this naming behavior is intended as part of the SqlNode conversion changes.

🤖 Prompt for AI Agents
In
integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push10.yaml
around lines 4-5, the LogicalAggregate output is labeled count(name) but the
LogicalProject’s $1 maps to lastname; update the aggregate output label to
count(lastname) (i.e., change count(name)=[COUNT($1)] to
count(lastname)=[COUNT($1)]) so the aggregation field name matches the projected
field, or if the naming is intentionally preserved by SqlNode conversion, add a
comment in the test explaining the expected behavior and leave the label as-is.

CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
physical: |
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={},cnt=COUNT($0)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"cnt":{"value_count":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQBQ3sKICAib3AiOiB7CiAgICAibmFtZSI6ICIrIiwKICAgICJraW5kIjogIlBMVVMiLAogICAgInN5bnRheCI6ICJCSU5BUlkiCiAgfSwKICAib3BlcmFuZHMiOiBbCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiAwLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJCSUdJTlQiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2],"DIGESTS":["balance",1]}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#1105:LogicalAggregate.NONE.[](input=RelSubset#1104,group={},cnt=COUNT($0)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"cnt":{"value_count":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQBQ3sKICAib3AiOiB7CiAgICAibmFtZSI6ICIrIiwKICAgICJraW5kIjogIlBMVVMiLAogICAgInN5bnRheCI6ICJCSU5BUlkiCiAgfSwKICAib3BlcmFuZHMiOiBbCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiAwLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJCSUdJTlQiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp":1765952821451990000,"SOURCES":[0,2],"DIGESTS":["balance",1]}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for utcTimestamp usage in the codebase
rg "utcTimestamp" -C 3 --type java --type yaml -i

Repository: opensearch-project/sql

Length of output: 50379


Inconsistent utcTimestamp value across test fixtures.

The utcTimestamp parameter in the base64-encoded script is a fixed test value, not derived from system time. However, this file has utcTimestamp=1765952821451990000 while all other expected output YAML files use utcTimestamp=0. Align this value with other test fixtures for consistency.

🤖 Prompt for AI Agents
In
integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push7.yaml
around line 8, the embedded script JSON contains
utcTimestamp=1765952821451990000 which is inconsistent with other fixtures;
update the utcTimestamp value to 0 inside the embedded script payload so it
matches the other expected output files, and if the script is encoded/escaped in
the file re-generate the escaped/encoded string accordingly so the file contains
the same JSON with "utcTimestamp":0.

Comment on lines +4 to +8
LogicalSort(fetch=[5])
LogicalProject(age=[$8])
LogicalSort(fetch=[10])
CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10], _id=[$11], _index=[$12], _score=[$13], _maxscore=[$14], _sort=[$15], _routing=[$16])
CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find test files that reference this expected output file
rg -l "explain_limit_10_5_push" --type=java

Repository: opensearch-project/sql

Length of output: 128


🏁 Script executed:

#!/bin/bash
# Find the test case in ExplainIT.java that uses this expected output
rg -A 10 -B 5 "explain_limit_10_5_push" integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java

Repository: opensearch-project/sql

Length of output: 662


🏁 Script executed:

#!/bin/bash
# Find the schema definition for the opensearch-sql_test_index_account table
# Look for test data or schema setup files
find integ-test -name "*.json" -o -name "*.yaml" | xargs grep -l "opensearch-sql_test_index_account" | head -5

Repository: opensearch-project/sql

Length of output: 540


🏁 Script executed:

#!/bin/bash
# Look for account index data structure or test setup
fd "account" --type f integ-test/src/test/resources | head -20

Repository: opensearch-project/sql

Length of output: 354


🏁 Script executed:

#!/bin/bash
# Read the account index mapping to verify field order
cat integ-test/src/test/resources/indexDefinitions/account_index_mapping.json | head -100

Repository: opensearch-project/sql

Length of output: 1020


🏁 Script executed:

#!/bin/bash
# Read the accounts test data to see the actual field order
cat integ-test/src/test/resources/accounts.json | head -3

Repository: opensearch-project/sql

Length of output: 314


🏁 Script executed:

#!/bin/bash
# Read the full YAML expected output to understand the complete plan structure
cat integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_5_push.yaml

Repository: opensearch-project/sql

Length of output: 951


Correct the outer projection description.

The outer LogicalProject contains only age=[$8], not all 17 fields—the full-field projection is internal (below the inner LogicalSort(fetch=[10])). The plan correctly represents the query pipeline: scan → limit 10 → project age → limit 5. The field index age=[$8] is correct (age is the 9th field at 0-indexed position 8), and the physical plan correctly optimizes to fetch only 5 rows with the age field.

🤖 Prompt for AI Agents
In
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_5_push.yaml
around lines 4-8, the outer LogicalProject is incorrectly showing the full
17-field projection; change it so the outer projection only lists the single
field age=[$8] (keeping the inner full-field LogicalProject under the inner
LogicalSort(fetch=[10]) intact) so the plan reads scan → LogicalSort(fetch=[10])
→ inner full projection → LogicalSort(fetch=[5]) → outer
LogicalProject(age=[$8]) to reflect limit 10 → project age → limit 5.

@yuancu yuancu added the calcite calcite migration releated label Dec 22, 2025
Comment on lines 15 to 16
// TODO: Refactor this match method
public boolean match(FunctionName functionName, List<RelDataType> argTypes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does match() method is used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's no longer used. I removed it.

* internally.
*/
private final ImmutableMap<BuiltinFunctionName, List<Pair<CalciteFuncSignature, FunctionImp>>>
private final ImmutableMap<BuiltinFunctionName, Pair<CalciteFuncSignature, FunctionImp>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a FunctionRegistry required? It is needed because CalciteRelNodeVisitor must resolve functions during analysis and translation?

Copy link
Collaborator Author

@yuancu yuancu Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still used when building logical plans from PPL's AST. yes, it's used in CalviteRelNodeVisitor. Now it's only name-based resolve

import org.opensearch.sql.calcite.utils.OpenSearchTypeFactory;

@UtilityClass
public class ValidationUtils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add comments/javadoc if some function is copy from Calcite?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I annotated every method I copied from Calcite's implementation

* }</pre>
*/
@Override
protected Result visitAntiOrSemiJoin(Join e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add qualifier can also solve the issue? e.g. t0.name. Then we can avoid the ANTI JOIN visit in PplSqlToRelConverter?

Copy link
Collaborator Author

@yuancu yuancu Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding a qualifier does solve the problem.

RCA

I investigated this query: source = opensearch-sql_test_index_state_country | where country = 'Canada' OR country = 'England' | left semi join left=a, right=b ON a.name = b.name opensearch-sql_test_index_occupation | sort a.age. It's original logical plan from CalciteRelNodeVisitor is:

 LogicalSort(sort0=[$5], dir0=[ASC-nulls-first])
  LogicalJoin(condition=[=($0, $6)], joinType=[semi])
    LogicalProject(name=[$0], country=[$1], state=[$2], month=[$3], year=[$4], age=[$5])
      LogicalFilter(condition=[SEARCH($1, Sarg['Canada':VARCHAR, 'England':VARCHAR]:VARCHAR)])
        CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_state_country]])
    LogicalProject(name=[$0], country=[$1], occupation=[$2], month=[$3], salary=[$4], year=[$5])
      CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_occupation]])

Its SQL is however

SELECT `name`, `country`, `state`, `month`, `year`, `age`
FROM `OpenSearch`.`opensearch-sql_test_index_state_country` AS `t0`
WHERE `country` IN ('Canada', 'England') AND EXISTS (SELECT 1
FROM (SELECT `name`, `country`, `occupation`, `month`, `salary`, `year`
FROM `OpenSearch`.`opensearch-sql_test_index_occupation`) AS `t1`
WHERE `name` = `t1`.`name`)  -- left `name` is unqualified
ORDER BY `age`

Why is the first name unqualified

In RelToSqlConverter, the condition is initially created with qualifier WHERE t0.name = t1.name:

SqlNode sqlCondition =
    convertConditionToSqlNode(e.getCondition(), leftContext, rightContext);

The following lines removes the qualifier because it's attempting to create a correlated subquery, and in standard SQL, outer query references in correlated subqueries are typically unqualified.

if (leftResult.neededAlias != null) {
  SqlShuttle visitor =
      new AliasReplacementShuttle(leftResult.neededAlias,
          e.getLeft().getRowType(), sqlSelect.getSelectList());
  sqlCondition = sqlCondition.accept(visitor);
}

The visitor replaces the identifiers whose first qualifier equals to leftResult.neededAlias:

@Override public SqlNode visit(SqlIdentifier id) {
...
  if (tableAlias.equals(id.names.get(0))) {
    int index =
        requireNonNull(tableType.getField(id.names.get(1), false, false),
            () -> "field " + id.names.get(1) + " is not found in "
                + tableType)
        .getIndex();
    SqlNode selectItem = source.get(index);
    if (selectItem.getKind() == SqlKind.AS) {
      selectItem = ((SqlCall) selectItem).operand(0);
    }
    return selectItem.clone(id.getParserPosition());
...
}

This replacement shuttle is there because the alias of the outer query t0 does not exist under some conditions. However, SQL's name resolution rules resolve the unqualified name to the innermost scope first, and there happen to exist such name in the innermost scope, so the name resolution is falsely resolved to t1.name == t1.name in the end.

Resolution

It can be resolved by either removing the replacement shuttle or enabling ANTI / SEMI join syntax. But the former solution risks encountering an non-existing t0 alias if the following condition is not met:

    if (leftResult.neededAlias != null && sqlSelect.getFrom() != null) {
      sqlSelect.setFrom(as(sqlSelect.getFrom(), leftResult.neededAlias));
    }

(Note: the above condition is updated by CALCITE-3810: Render ANTI and SEMI join to NOT EXISTS and EXISTS in the JDBC adapter. The latest update may have resolved this problem, but the update is not included in Calcite 1.41)

I think adopting the second solution, enabling ANTI/SEMI join, is better:

  • It's safer because we will not run into trouble when the above condition is not met.

  • Spark SQL supports ANTI and SEMI JOIN

  • As we are creating ANTI / SEMI JOIN in CalciteRelNodeVisitor already, it will be more consistent if the logical plan translated back also contains them.

    if (node.getJoinType() == SEMI || node.getJoinType() == ANTI) {
    // semi and anti join only return left table outputs
    context.relBuilder.join(
    JoinAndLookupUtils.translateJoinType(node.getJoinType()), joinCondition);
    return context.relBuilder.peek();
    }

* }</pre>
*/
@Override
protected Result visitAntiOrSemiJoin(Join e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add qualifier can also solve the issue? e.g. t0.name. Then we can avoid the ANTI JOIN visit in PplSqlToRelConverter?

Comment on lines +45 to +48
@Override
public SqlKind getKind() {
return SqlKind.DIVIDE;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to enable operator-dependent type derivation.

* internally.
*/
private final ImmutableMap<BuiltinFunctionName, List<Pair<CalciteFuncSignature, FunctionImp>>>
private final ImmutableMap<BuiltinFunctionName, Pair<CalciteFuncSignature, FunctionImp>>
Copy link
Collaborator Author

@yuancu yuancu Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still used when building logical plans from PPL's AST. yes, it's used in CalviteRelNodeVisitor. Now it's only name-based resolve

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

402-429: Document the intentional re-registration design or prevent duplicate registrations at initialization.

The code logs warnings when functions are registered multiple times but continues with silent overwrites. Since OpenSearchExecutionEngine is instantiated per-request and calls registerOpenSearchFunctions() in its constructor, external functions (GEOIP, DISTINCT_COUNT_APPROX) are re-registered on every request, generating repeated warnings and making actual configuration errors difficult to detect.

Either:

  1. Document that re-registration is intentional for plugin override scenarios
  2. Ensure functions are registered only once during startup (e.g., in a static initializer or plugin loader rather than per-request)
  3. Make duplicate registrations an error to fail fast on configuration mistakes
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a98c3a and e156d0f.

📒 Files selected for processing (4)
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
💤 Files with no reviewable changes (1)
  • core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
core/src/main/java/org/opensearch/sql/expression/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns

  • Type handling must be consistent with project type system
  • Error handling must use appropriate exception types
  • Null handling must be explicit and documented

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes

Applied to files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (1)
  • PPLOperandTypes (20-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (14)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (2)

9-62: LGTM: Imports align with validation pipeline.

The new imports for Calcite's validation infrastructure (SqlValidator, SqlNode, SqlToRelConverter, etc.) and PPL-specific validation components are appropriate for the validation pipeline being introduced.


124-125: LGTM: Clean validation integration.

The validation step is consistently integrated into both execute and explain paths, positioned logically between filter merging and optimization.

Also applies to: 166-167

core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (4)

1-58: LGTM: Well-structured class declaration.

The class follows proper patterns with a factory method, extends the appropriate Calcite base class, and includes clear documentation.


81-90: LGTM: Proper UDT handling.

The override correctly identifies cases where UDTs require casting from character types, addressing a gap in the base implementation.


92-141: LGTM: Datetime and IP comparison handling.

The custom comparison type resolution correctly handles datetime cross-type comparisons and IP-to-string comparisons as described in the PR objectives. The nullability is properly preserved.

Note: The previous review comment about using && instead of & on line 120 has been addressed.


143-181: LGTM: Robust operand coercion.

The method properly handles dynamic parameters, nullability synchronization, and uses SAFE_CAST to avoid malformed data errors as mentioned in the PR objectives.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (8)

256-256: LGTM: Import additions are appropriate.

The new imports (Locale and OpenSearchTypeUtil) are properly used in the code. Locale is used for internationalized string formatting, and OpenSearchTypeUtil supports the runtime type detection for ADD operations.

Also applies to: 284-284


359-394: LGTM: Registry simplification aligns with Calcite validation migration.

The migration from signature-based function lists to single-implementation-per-function simplifies the registry structure. Thread safety is maintained: immutable maps for internal registries and ConcurrentHashMap for external ones.


431-478: LGTM: Resolution logic correctly simplified.

The resolution logic is now straightforward—direct name-based lookup and invocation. Type checking has been migrated to Calcite's validation phase (before resolution), eliminating the need for signature matching loops. The compulsoryCast method remains as a documented ad-hoc workaround for the REDUCE function's specific casting requirements.


501-560: LGTM: Type checker migration to Calcite completed correctly.

The migration from PPLTypeChecker to SqlOperandTypeChecker is properly implemented. The extractTypeCheckerFromUDF helper correctly handles both user-defined and built-in operators. Note that some registrations pass null as the type checker (e.g., for JSON_ARRAY, JSON_OBJECT, TYPEOF), which is acceptable for functions with custom validation logic.


562-703: LGTM: Standard operator registration follows correct patterns.

The registration of comparison and standard operators correctly uses Calcite's type checkers. The SUBTRACT operator properly supports both numeric and datetime operands (for timestamp binning). The use of composite type checkers with .or() follows Calcite conventions.


811-867: LGTM: Function implementations handle edge cases correctly.

The TOSTRING, MVJOIN, SPLIT, and MVINDEX implementations properly handle their respective requirements:

  • TOSTRING supports both 1-arg (cast) and 2-arg (format) forms
  • SPLIT correctly handles the empty delimiter edge case using REGEXP_EXTRACT_ALL
  • MVINDEX supports both 2-arg and 3-arg variants with appropriate type checking

926-1228: LGTM: Remaining function registrations and builder classes are consistent.

The rest of the populate() method and builder classes follow consistent patterns:

  • Type checkers properly use Calcite's OperandTypes and SqlTypeFamily
  • IS_NOT_NULL/IS_NULL correctly use IGNORE family to support struct types (good fix)
  • Internal builder throws on duplicates (deterministic), while external registry warns (flexible for plugins)—this distinction is appropriate
  • Aggregate function registrations consistently use SqlOperandTypeChecker

914-925: Document the ADD operator's dual-behavior type checking pattern.

The ADD operator uses SqlStdOperatorTable.PLUS.getOperandTypeChecker() but implements runtime type detection to choose between CONCAT (for strings) and PLUS (for numbers). While the inline comment explains this is intentional to avoid confusion during function resolution, add a Javadoc comment or expanded code comment explaining:

  1. Why PLUS's type checker is used despite ADD accepting character types
  2. Whether Calcite's validation phase permits string operands to reach this runtime resolution, or if there's custom validator logic that handles this case
  3. Reference this as a Calcite-specific workaround pattern for dual-behavior operators

Comment on lines +60 to +75
@Override
public @Nullable RelDataType implicitCast(RelDataType in, SqlTypeFamily expected) {
RelDataType casted = super.implicitCast(in, expected);
if (casted == null) {
// String -> DATETIME is converted to String -> TIMESTAMP
if (OpenSearchTypeUtil.isCharacter(in) && expected == SqlTypeFamily.DATETIME) {
return createUDTWithAttributes(factory, in, OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP);
}
return null;
}
return switch (casted.getSqlTypeName()) {
case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY ->
createUDTWithAttributes(factory, in, casted.getSqlTypeName());
default -> casted;
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace switch expression with Java 11 compatible syntax.

The switch expression on lines 70-74 uses Java 14+ syntax, but the coding guidelines require Java 11 compatibility for OpenSearch 2.x.

🔎 Proposed fix using traditional switch statement
-    return switch (casted.getSqlTypeName()) {
-      case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY ->
-          createUDTWithAttributes(factory, in, casted.getSqlTypeName());
-      default -> casted;
-    };
+    switch (casted.getSqlTypeName()) {
+      case DATE:
+      case TIME:
+      case TIMESTAMP:
+      case BINARY:
+        return createUDTWithAttributes(factory, in, casted.getSqlTypeName());
+      default:
+        return casted;
+    }

As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.

🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
around lines 60 to 75, replace the Java 14+ switch expression with a Java
11-compatible traditional switch statement: compute the result into a local
RelDataType variable (e.g., result), use a traditional
switch(casted.getSqlTypeName()) with case labels for DATE, TIME, TIMESTAMP,
BINARY that set result = createUDTWithAttributes(factory, in,
casted.getSqlTypeName()); and a default that sets result = casted; then return
result; ensuring all code paths assign result before the return.

Comment on lines +183 to +202
private static SqlNode castTo(SqlNode node, RelDataType type) {
if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) {
ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type);
return switch (exprType) {
case ExprCoreType.DATE ->
PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
case ExprCoreType.TIMESTAMP ->
PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
case ExprCoreType.TIME ->
PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
default -> throw new UnsupportedOperationException("Unsupported type: " + exprType);
};
}
// Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted
return SqlLibraryOperators.SAFE_CAST.createCall(
node.getParserPosition(),
node,
SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace switch expression with Java 11 compatible syntax.

The switch expression on lines 186-195 uses Java 14+ syntax. This needs to be converted for Java 11 compatibility.

🔎 Proposed fix using traditional switch statement
   private static SqlNode castTo(SqlNode node, RelDataType type) {
     if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) {
       ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type);
-      return switch (exprType) {
-        case ExprCoreType.DATE ->
-            PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
-        case ExprCoreType.TIMESTAMP ->
-            PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
-        case ExprCoreType.TIME ->
-            PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
-        case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
-        default -> throw new UnsupportedOperationException("Unsupported type: " + exprType);
-      };
+      if (exprType == ExprCoreType.DATE) {
+        return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
+      } else if (exprType == ExprCoreType.TIMESTAMP) {
+        return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
+      } else if (exprType == ExprCoreType.TIME) {
+        return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
+      } else if (exprType == ExprCoreType.IP) {
+        return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
+      } else {
+        throw new UnsupportedOperationException("Unsupported type: " + exprType);
+      }
     }

As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private static SqlNode castTo(SqlNode node, RelDataType type) {
if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) {
ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type);
return switch (exprType) {
case ExprCoreType.DATE ->
PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
case ExprCoreType.TIMESTAMP ->
PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
case ExprCoreType.TIME ->
PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
default -> throw new UnsupportedOperationException("Unsupported type: " + exprType);
};
}
// Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted
return SqlLibraryOperators.SAFE_CAST.createCall(
node.getParserPosition(),
node,
SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable()));
}
private static SqlNode castTo(SqlNode node, RelDataType type) {
if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) {
ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type);
if (exprType == ExprCoreType.DATE) {
return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
} else if (exprType == ExprCoreType.TIMESTAMP) {
return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
} else if (exprType == ExprCoreType.TIME) {
return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
} else if (exprType == ExprCoreType.IP) {
return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
} else {
throw new UnsupportedOperationException("Unsupported type: " + exprType);
}
}
// Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted
return SqlLibraryOperators.SAFE_CAST.createCall(
node.getParserPosition(),
node,
SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable()));
}
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
around lines 183 to 202, the code uses a Java 14+ switch expression for mapping
ExprType to SqlNode; replace it with a Java 11-compatible traditional switch
statement: declare a SqlNode result variable, switch on exprType with case
labels that assign result = PPLBuiltinOperators.X.createCall(...); include a
default case that throws the same UnsupportedOperationException, then break each
case and after the switch return result; keep the existing parser position and
node arguments and preserve surrounding logic (including the SAFE_CAST return
for non-datetime/ip types).

Comment on lines +294 to +352
/**
* Validates a RelNode by converting it to SqlNode, performing validation, and converting back.
*
* <p>This process enables Calcite's type validation and implicit casting mechanisms to work on
* PPL queries.
*
* @param relNode the relation node to validate
* @param context the Calcite plan context containing the validator
* @return the validated (and potentially modified) relation node
*/
private RelNode validate(RelNode relNode, CalcitePlanContext context) {
SkipRelValidationShuttle skipShuttle = new SkipRelValidationShuttle();
relNode.accept(skipShuttle);
if (skipShuttle.shouldSkipValidation()) {
return relNode;
}
// Fix interval literals before conversion to SQL
RelNode sqlRelNode = relNode.accept(new PplRelToSqlRelShuttle(context.rexBuilder, true));

// Convert RelNode to SqlNode for validation
RelToSqlConverter rel2sql = new PplRelToSqlNodeConverter(OpenSearchSparkSqlDialect.DEFAULT);
SqlImplementor.Result result = rel2sql.visitRoot(sqlRelNode);
SqlNode root = result.asStatement();

// Rewrite SqlNode to remove database qualifiers
SqlNode rewritten = root.accept(new SqlRewriteShuttle());
SqlValidator validator = context.getValidator();
try {
validator.validate(Objects.requireNonNull(rewritten));
} catch (CalciteContextException e) {
if (ValidationUtils.tolerantValidationException(e)) {
return relNode;
}
throw new ExpressionEvaluationException(e.getMessage(), e);
}

SqlToRelConverter.Config sql2relConfig =
SqlToRelConverter.config()
// Do not remove sort in subqueries so that the orders for queries like `... | sort a
// | fields b` is preserved
.withRemoveSortInSubQuery(false)
// Disable automatic JSON_TYPE_OPERATOR wrapping for nested JSON functions.
// See CALCITE-4989: Calcite wraps nested JSON functions with JSON_TYPE by default
.withAddJsonTypeOperatorEnabled(false)
// Set hint strategy so that hints can be properly propagated.
// See SqlToRelConverter.java#convertSelectImpl
.withHintStrategyTable(PPLHintStrategyTable.getHintStrategyTable());
SqlToRelConverter sql2rel =
new PplSqlToRelConverter(
context.config.getViewExpander(),
validator,
validator.getCatalogReader().unwrap(CalciteCatalogReader.class),
context.relBuilder.getCluster(),
PplConvertletTable.INSTANCE,
sql2relConfig);
// Convert the validated SqlNode back to RelNode
RelNode validatedRel = sql2rel.convertQuery(rewritten, false, true).project();
return validatedRel.accept(new PplRelToSqlRelShuttle(context.rexBuilder, false));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find ValidationUtils.java file
find . -name "ValidationUtils.java" -type f 2>/dev/null

# Search for tolerantValidationException method usage
rg -n "tolerantValidationException" --type=java -B2 -A5

Repository: opensearch-project/sql

Length of output: 1922


🏁 Script executed:

#!/bin/bash
# List all Java files in executor module that might contain ValidationUtils
fd "ValidationUtils" . --type f --extension java

# Also search for the method definition and all its usages
rg -n "tolerantValidationException\s*\(" --type=java -A3 -B2

Repository: opensearch-project/sql

Length of output: 1782


🏁 Script executed:

#!/bin/bash
# Find all Validation-related classes
fd -i "validation" . --type f --extension java | head -20

# Search for CalciteContextException handling
rg "CalciteContextException" --type=java -n -B3 -A5

# Look for logging patterns in the validate method
rg -n "log\." core/src/main/java/org/opensearch/sql/executor/QueryService.java 2>/dev/null || echo "File check"

Repository: opensearch-project/sql

Length of output: 9799


🏁 Script executed:

#!/bin/bash
# Find the QueryService.java file and examine the validate method context
rg -n "class QueryService" --type=java -A200 | grep -A50 "private RelNode validate"

# Find ValidationUtils and tolerantValidationException
rg -rn "class ValidationUtils" --type=java

# Check for any logging configuration related to validation
rg -n "logger\|log\.info\|log\.debug" core/src/main/java/org/opensearch/sql/executor/QueryService.java

Repository: opensearch-project/sql

Length of output: 152


🏁 Script executed:

#!/bin/bash
# Search for ValidationUtils implementation
find . -path "*/src/*" -name "*.java" -exec grep -l "ValidationUtils" {} \;

# Get the file location and content
rg -n "class ValidationUtils|static.*tolerantValidationException" --type=java --max-count=5

Repository: opensearch-project/sql

Length of output: 1172


🏁 Script executed:

#!/bin/bash
# Find all files mentioning ValidationUtils
rg -l "ValidationUtils" --type=java | head -10

# For each file, search for the method
for file in $(rg -l "ValidationUtils" --type=java | head -5); do
  echo "=== File: $file ==="
  rg -A10 "tolerantValidationException" "$file"
done

Repository: opensearch-project/sql

Length of output: 1416


🏁 Script executed:

#!/bin/bash
# Find where QueryService.java is and look at the surrounding code
find . -name "QueryService.java" -type f 2>/dev/null | head -1 | xargs -I {} bash -c 'wc -l {} && echo "File found" && grep -n "tolerantValidationException" {}'

# Search for ValidationUtils definition
rg -n "class ValidationUtils" --type=java
rg -n "static.*boolean.*tolerantValidationException" --type=java

Repository: opensearch-project/sql

Length of output: 468


🏁 Script executed:

#!/bin/bash
# Find ValidationUtils class
rg -l "class ValidationUtils" --type=java

# Show the full implementation
rg -A 30 "static.*tolerantValidationException" --type=java

Repository: opensearch-project/sql

Length of output: 1202


🏁 Script executed:

#!/bin/bash
# Get the complete ValidationUtils.java file
cat core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java

Repository: opensearch-project/sql

Length of output: 7184


🏁 Script executed:

#!/bin/bash
# Get more context around the tolerantValidationException method
head -170 core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java | tail -30

Repository: opensearch-project/sql

Length of output: 1190


Add logging when validation is skipped for known Calcite compatibility issues.

The tolerant validation exception handling is well-documented and properly scoped to specific Calcite bugs (aggregate nesting and windowed aggregates in GROUP BY). However, when the fallback to the original RelNode occurs at line 325, there is no logging to track when queries bypass validation. Add a debug or warn log statement to improve observability and aid troubleshooting when these edge cases are encountered.

🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/executor/QueryService.java around lines
294 to 352, when validator.validate(...) throws a tolerant Calcite exception the
code currently returns the original relNode without any observability; add a log
entry (debug or warn based on severity policy) inside the catch branch where
ValidationUtils.tolerantValidationException(e) is true before returning relNode,
including the exception message and context (e.g., which validation rule was
tolerated and a short identifier of the relNode or query) so operators can see
when validation was skipped.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
@yuancu
Copy link
Collaborator Author

yuancu commented Dec 23, 2025

@coderabbitai generate docstrings

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #4989

coderabbitai bot added a commit that referenced this pull request Dec 23, 2025
Docstrings generation was requested by @yuancu.

* #4892 (comment)

The following files were modified:

* `core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java`
* `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
* `core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java`
* `core/src/main/java/org/opensearch/sql/calcite/ExtendedRexBuilder.java`
* `core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java`
* `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java`
* `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java`
* `core/src/main/java/org/opensearch/sql/calcite/utils/binning/BinnableField.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/SqlOperatorTableProvider.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java`
* `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java`
* `core/src/main/java/org/opensearch/sql/executor/QueryService.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ExistsFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ForallFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVAppendFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapAppendFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapRemoveFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java`
* `core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java`
* `core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java`
* `core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonAppendFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtendFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonKeysFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java`
* `core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java`
* `core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java`
* `core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java`
* `core/src/main/java/org/opensearch/sql/expression/function/udf/math/DivideFunction.java`
* `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java`
* `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMaxFunction.java`
* `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMinFunction.java`
* `core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/clickbench/PPLClickBenchIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendcolIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLExplainIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java`
* `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java`
* `integ-test/src/test/java/org/opensearch/sql/ppl/DateTimeFunctionIT.java`
* `integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java`
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (2)

60-75: Replace switch expression with Java 11 compatible syntax.

The switch expression on lines 70-74 uses Java 14+ syntax, but the coding guidelines require Java 11 compatibility for OpenSearch 2.x. This issue was previously flagged but has not yet been addressed.

🔎 Proposed fix using traditional switch statement
   @Override
   public @Nullable RelDataType implicitCast(RelDataType in, SqlTypeFamily expected) {
     RelDataType casted = super.implicitCast(in, expected);
     if (casted == null) {
       // String -> DATETIME is converted to String -> TIMESTAMP
       if (OpenSearchTypeUtil.isCharacter(in) && expected == SqlTypeFamily.DATETIME) {
         return createUDTWithAttributes(factory, in, OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP);
       }
       return null;
     }
-    return switch (casted.getSqlTypeName()) {
-      case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY ->
-          createUDTWithAttributes(factory, in, casted.getSqlTypeName());
-      default -> casted;
-    };
+    switch (casted.getSqlTypeName()) {
+      case DATE:
+      case TIME:
+      case TIMESTAMP:
+      case BINARY:
+        return createUDTWithAttributes(factory, in, casted.getSqlTypeName());
+      default:
+        return casted;
+    }
   }

As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.


183-202: Replace switch expression with Java 11 compatible syntax.

The switch expression on lines 186-195 uses Java 14+ syntax. This needs to be converted for Java 11 compatibility. This issue was previously flagged but has not yet been addressed.

🔎 Proposed fix using if-else chain
   private static SqlNode castTo(SqlNode node, RelDataType type) {
     if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) {
       ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type);
-      return switch (exprType) {
-        case ExprCoreType.DATE ->
-            PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
-        case ExprCoreType.TIMESTAMP ->
-            PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
-        case ExprCoreType.TIME ->
-            PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
-        case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
-        default -> throw new UnsupportedOperationException("Unsupported type: " + exprType);
-      };
+      if (exprType == ExprCoreType.DATE) {
+        return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
+      } else if (exprType == ExprCoreType.TIMESTAMP) {
+        return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
+      } else if (exprType == ExprCoreType.TIME) {
+        return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
+      } else if (exprType == ExprCoreType.IP) {
+        return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
+      } else {
+        throw new UnsupportedOperationException("Unsupported type: " + exprType);
+      }
     }
     // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted
     return SqlLibraryOperators.SAFE_CAST.createCall(
         node.getParserPosition(),
         node,
         SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable()));
   }

As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.

🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (1)

56-58: Consider adding JavaDoc to the public constructor.

While the factory method above has good documentation, coding guidelines require JavaDoc on all public methods. A brief description would improve API documentation completeness.

📝 Suggested JavaDoc
+  /**
+   * Constructs a PplTypeCoercion instance.
+   *
+   * @param typeFactory the type factory
+   * @param validator the SQL validator
+   */
   public PplTypeCoercion(RelDataTypeFactory typeFactory, SqlValidator validator) {
     super(typeFactory, validator);
   }

As per coding guidelines, all public classes and methods must have proper JavaDoc.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)

402-408: Consider making duplicate registration warnings more actionable.

The duplicate registration warning is helpful, but could guide developers on next steps.

🔎 Suggested enhancement to warning message
     if (externalFunctionRegistry.containsKey(functionName)) {
       logger.warn(
-          String.format(Locale.ROOT, "Function %s is registered multiple times", functionName));
+          String.format(
+              Locale.ROOT,
+              "Function %s is already registered and will be overwritten. Consider using a unique function name.",
+              functionName));
     }

914-925: Improve comment clarity for the ADD operator runtime selection logic.

The dynamic selection between CONCAT and PLUS based on operand types should be clarified. Since RexNode.getType() is guaranteed to return non-null (per Calcite contract), the current implementation is safe without additional filtering. However, a comment explaining the runtime selection rationale would improve maintainability.

Consider expanding the comment to explain that CONCAT is selected when all operands are character types, otherwise PLUS is used for numeric addition. This clarifies the intent for future maintainers.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e156d0f and aff8541.

📒 Files selected for processing (5)
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
  • core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
  • core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
💤 Files with no reviewable changes (1)
  • core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • core/src/main/java/org/opensearch/sql/executor/QueryService.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
core/src/main/java/org/opensearch/sql/expression/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns

  • Type handling must be consistent with project type system
  • Error handling must use appropriate exception types
  • Null handling must be explicit and documented

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Maintain Java 11 compatibility when possible for OpenSearch 2.x

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes

Applied to files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧬 Code graph analysis (3)
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
  • PPLFuncImpTable (291-1269)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (2)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
  • OpenSearchTypeFactory (62-394)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)
  • PplTypeCoercionRule (45-107)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java (1)
  • UserDefinedFunctionUtils (52-320)
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (1)
  • PPLOperandTypes (20-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: WhiteSource Security Check
🔇 Additional comments (19)
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java (5)

19-28: LGTM! Simplified registry access.

The change from Pair<?, AggHandler> to direct AggHandler storage simplifies the API and eliminates unnecessary wrapping. The reflection-based approach is appropriate for test utilities that need to verify internal registry state.


30-39: Consistent with simplified registry type.

The method correctly uses the new Map<BuiltinFunctionName, AggHandler> type and directly accesses the handler without Pair extraction. The assertions remain appropriate and meaningful.


47-54: Direct handler access is cleaner.

Line 50 demonstrates the benefit of the simplified registry—direct assignment of AggHandler without needing to extract from a Pair. This makes the code more straightforward and easier to understand.


56-70: Consistent type updates.

Both methods correctly use the updated Map<BuiltinFunctionName, AggHandler> type. Since these methods primarily operate on registry keys and size, the value type simplification has minimal impact while maintaining consistency across the test base.


72-98: Unaffected helper methods remain correct.

These utility methods operate independently of the aggregate function registry and are unaffected by the type simplification. They continue to provide useful validation for function name resolution and enum mapping.

core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (5)

36-54: LGTM!

The class structure and factory method are well-designed. The JavaDoc clearly explains the purpose and the blacklist approach for PPL-specific type coercion.


77-90: LGTM!

The special handling for user-defined types (UDTs) is well-documented and necessary to prevent UDTs from being incorrectly treated as character types during cast validation.


92-111: LGTM!

The datetime-string equality handling is well-implemented, correctly managing bidirectional comparisons and preserving UDT types when appropriate.


113-141: LGTM!

The common type resolution for binary comparisons correctly implements PPL-specific rules:

  • Date/time combinations coerce to timestamp
  • IP/string comparisons coerce to IP
  • Nullability is properly preserved

143-181: LGTM!

The operand coercion logic is well-structured with proper handling of:

  • Dynamic parameters (correctly skipped)
  • Attribute synchronization
  • SAFE_CAST nullability propagation
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (9)

359-394: Registry simplification aligns well with Calcite validation migration.

The migration from List<Pair<CalciteFuncSignature, FunctionImp>> to single FunctionImp per function name is a significant architectural improvement that complements the new Calcite-based type checking and coercion. The simplified initialization logic using putAll is cleaner and more maintainable.

Based on learnings, this follows the pattern of leveraging Calcite's built-in mechanisms rather than maintaining custom parallel systems.


460-478: Simplified resolution logic is cleaner and more performant.

The removal of signature list iteration and coercion logic at this layer makes sense given that type checking and coercion are now handled by Calcite's validation pipeline. The direct implementation lookup is more efficient.


504-560: Type checker migration to SqlOperandTypeChecker is well-structured.

The updated signatures using SqlOperandTypeChecker and the helper method extractTypeCheckerFromUDF provide clean integration with Calcite's type system. The direct use of OperandTypes.family() in registerDivideFunction is more maintainable than wrapped custom type checkers.


813-821: TOSTRING implementation correctly handles both signatures.

The inlined implementation cleanly handles the 1-argument case (direct cast to VARCHAR) and 2-argument case (delegates to PPLBuiltinOperators.TOSTRING). The type checker using .or() correctly validates both signatures.


1082-1110: Builder simplifications are consistent with the single-implementation model.

The Builder and AggBuilder now enforce single registration per function name, which is appropriate for the new architecture. The exception thrown on duplicate internal registration (line 1091) correctly prevents configuration errors during initialization.


1137-1191: Aggregate function registrations correctly use type checkers.

The aggregate registrations properly extract type checkers from UDFs using extractTypeCheckerFromUDF() and use appropriate OperandTypes for built-in aggregates. The MEDIAN implementation's reuse of PERCENTILE_APPROX with a fixed percentile value is elegant.


1246-1268: Excellent JavaDoc for extractTypeCheckerFromUDF.

The comprehensive JavaDoc clearly explains the purpose of unwrapping UDFOperandMetadata to extract the underlying SqlOperandTypeChecker and references the relevant PPLOperandTypes class. This documentation is valuable given the method's increased usage across the refactored registration logic.


994-1009: The SUBSTRING and SUBSTR type checkers are correctly implemented and documented.

Both functions properly register with custom type checkers using OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.INTEGER).or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.INTEGER, SqlTypeFamily.INTEGER)) to support both 2-arg and 3-arg variants. This aligns with Calcite's SUBSTRING(string FROM start [FOR length]) pattern and matches OpenSearch PPL's substring(str, start) or substring(str, start, length) syntax. The inline comment in the code already documents that manual type checker creation is necessary since SqlStdOperatorTable.SUBSTRING.getOperandTypeChecker() is null. Test coverage exists for both variants (e.g., AstBuilderTest.java validates 3-arg form, SemanticAnalyzerScalarFunctionTest.java covers multiple cases).


564-581: Verify OperandTypes type family handling with OpenSearch datetime and temporal types.

The direct OperandTypes.family(SqlTypeFamily.DATETIME, SqlTypeFamily.DATETIME) registration at lines 580-581 is correct. OpenSearch-specific UDT types (TIMESTAMP, TIME, DATE, BINARY) are properly mapped to SqlTypes through the PplValidator's type derivation layer (sqlTypeToUserDefinedType() in PplValidator.java and createUDTWithAttributes() in ValidationUtils.java), which handles the integration between Calcite's standard type families and OpenSearch's custom types during validation.

…king is deferred to the validation phase against the finished logical plan

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

397-424: Add null validation for external registration parameters.

Both registerExternalOperator and registerExternalAggOperator should validate that the operator and aggFunction parameters are not null before registration. Currently, null operators would cause NPE during function resolution.

🔎 Proposed null checks
 public void registerExternalOperator(BuiltinFunctionName functionName, SqlOperator operator) {
+  if (operator == null) {
+    throw new IllegalArgumentException("Operator cannot be null");
+  }
   if (externalFunctionRegistry.containsKey(functionName)) {
     logger.warn(
         String.format(Locale.ROOT, "Function %s is registered multiple times", functionName));
   }
   externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args));
 }

 public void registerExternalAggOperator(
     BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) {
+  if (aggFunction == null) {
+    throw new IllegalArgumentException("AggFunction cannot be null");
+  }
   if (aggExternalFunctionRegistry.containsKey(functionName)) {
     logger.warn(
         String.format(
             Locale.ROOT, "Aggregate function %s is registered multiple times", functionName));
   }
🧹 Nitpick comments (4)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (2)

48-52: Consider extracting UDT type-checking pattern.

Multiple methods repeat the pattern of checking isUserDefinedType, casting to AbstractExprRelDataType<?>, and then accessing getUdt() or getExprType(). While not critical, consider extracting this pattern to reduce duplication.

💡 Example refactor

Add a private helper method:

private static <T> Optional<T> getUdtAttribute(RelDataType type, Function<AbstractExprRelDataType<?>, T> extractor) {
  if (isUserDefinedType(type)) {
    return Optional.of(extractor.apply((AbstractExprRelDataType<?>) type));
  }
  return Optional.empty();
}

Then refactor methods like isDate:

public static boolean isDate(RelDataType type) {
  return getUdtAttribute(type, AbstractExprRelDataType::getUdt)
      .map(udt -> udt == OpenSearchTypeFactory.ExprUDT.EXPR_DATE)
      .orElse(SqlTypeName.DATE.equals(type.getSqlTypeName()));
}

Note: This is a stylistic suggestion; the current implementation is also acceptable.

Also applies to: 72-78, 93-96, 111-115, 130-134, 156-158, 172-175


33-55: Clarify JavaDoc for mixed numeric/character checking.

The method checks for both numeric and character types among built-in SQL types, but only numeric types among UDTs. The JavaDoc could make this asymmetry more explicit to avoid confusion.

🔎 Suggested clarification
   /**
-   * Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types
-   * (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric
-   * types, and string types (VARCHAR, CHAR).
+   * Checks if the RelDataType represents a numeric or character type.
+   *
+   * <p>For built-in SQL types, this method checks both numeric types (INTEGER, BIGINT, SMALLINT,
+   * TINYINT, FLOAT, DOUBLE, DECIMAL, REAL) and character types (VARCHAR, CHAR). For OpenSearch
+   * UDTs, only numeric types are checked (character UDTs like VARCHAR-mapped types are excluded).
    *
    * @param fieldType the RelDataType to check
-   * @return true if the type is numeric or string, false otherwise
+   * @return true if the type is numeric (built-in or UDT) or character (built-in only), false
+   *     otherwise
    */
   public static boolean isNumericOrCharacter(RelDataType fieldType) {
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)

479-493: Add bounds checking in compulsoryCast.

The REDUCE case accesses args[1] and args[2] without verifying that the array has at least 3 elements. While Calcite's validation should ensure correct arity, defensive bounds checking would prevent potential ArrayIndexOutOfBoundsException and provide clearer error messages.

🔎 Proposed bounds check
   switch (functionName) {
     case BuiltinFunctionName.REDUCE:
+      if (args.length < 3) {
+        throw new IllegalArgumentException(
+            String.format("REDUCE requires at least 3 arguments, got %d", args.length));
+      }
       // Set the second argument to the return type of the lambda function, so that
       // code generated with linq4j can correctly accumulate the result.
       RexLambda call = (RexLambda) args[2];
       args[1] = builder.makeCast(call.getType(), args[1], true, true);
       break;

286-1114: Consider adding class-level JavaDoc and monitoring file size.

The PPLFuncImpTable class serves as a central registry for PPL function implementations and plays a critical role in the Calcite integration. Consider adding comprehensive class-level JavaDoc that explains:

  • The migration from multi-signature to single-implementation model
  • How Calcite's validation handles type checking and coercion
  • The relationship between internal and external registries
  • When to use registerExternalOperator vs internal registration

The file currently exceeds 1000 lines, which may warrant future refactoring to extract operator groups (e.g., date/time operators, string operators) into separate builder classes.

As per coding guidelines, classes over 500 lines may need organization review.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 227fb43 and 229ddb7.

📒 Files selected for processing (2)
  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
core/src/main/java/org/opensearch/sql/expression/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns

  • Type handling must be consistent with project type system
  • Error handling must use appropriate exception types
  • Null handling must be explicit and documented

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes

Applied to files:

  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (7)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)

16-31: Well-designed utility class for UDT-aware type checking.

The utility class provides a clean, centralized API for type checking with proper UDT support. The design follows established patterns (check built-in types first, then UDTs) and integrates well with Calcite's SqlTypeUtil. This aligns with the PR's objectives to support UDTs during Calcite validation.

core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (6)

455-473: LGTM! External function override logic is correct.

The resolution logic correctly prioritizes external function implementations over internal ones, allowing data-storage-dependent implementations to override built-in functions. The null check ensures fail-fast behavior for unregistered functions.


752-760: LGTM! TOSTRING implementation handles both single and multi-arg cases correctly.

The implementation appropriately uses Calcite's CAST for the simple single-argument case and delegates to a custom operator for more complex scenarios. The nullable VARCHAR type is correctly specified.


769-794: LGTM! SPLIT implementation elegantly handles empty delimiter edge case.

The use of a CASE expression to conditionally choose between character splitting (via REGEXP_EXTRACT_ALL with '.') and standard delimiter splitting is a clean solution. The inline comments clearly explain the behavior.


531-536: LGTM! Migration to Calcite standard comparison operators.

The direct registration of comparison operators using SqlStdOperatorTable aligns with the PR objective of leveraging Calcite's built-in validation and type coercion instead of custom implementations.


968-982: LGTM! Simplified Builder enforces single-registration constraint.

The refactored Builder class enforces that each function name maps to exactly one implementation, preventing ambiguity. The use of IllegalStateException with a clear error message ensures fail-fast behavior during initialization.


984-1098: LGTM! AggBuilder follows consistent pattern with simplified registration.

The AggBuilder refactor mirrors the Builder changes, maintaining consistency across the codebase. The aggregation handler registrations are clean and leverage lambda expressions effectively. The inline implementations (AVG, COUNT, PERCENTILE_APPROX, MEDIAN, EARLIEST, LATEST, FIRST, LAST) are well-structured.

Comment on lines 22 to 178
public class OpenSearchTypeUtil {
/**
* Whether a given RelDataType is a user-defined type (UDT)
*
* @param type the RelDataType to check
* @return true if the type is a user-defined type, false otherwise
*/
public static boolean isUserDefinedType(RelDataType type) {
return type instanceof AbstractExprRelDataType<?>;
}

/**
* Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types
* (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric
* types, and string types (VARCHAR, CHAR).
*
* @param fieldType the RelDataType to check
* @return true if the type is numeric or string, false otherwise
*/
public static boolean isNumericOrCharacter(RelDataType fieldType) {
// Check standard SQL numeric types & string types (VARCHAR, CHAR)
if (SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType)) {
return true;
}

// Check for OpenSearch UDT numeric types
if (isUserDefinedType(fieldType)) {
AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType;
ExprType udtType = exprType.getExprType();
return ExprCoreType.numberTypes().contains(udtType);
}

return false;
}

/**
* Checks if the RelDataType represents a time-based field (timestamp, date, or time). Supports
* both standard SQL time types (including TIMESTAMP, TIMESTAMP_WITH_LOCAL_TIME_ZONE, DATE, TIME,
* and their timezone variants) and OpenSearch UDT time types.
*
* @param fieldType the RelDataType to check
* @return true if the type is time-based, false otherwise
*/
public static boolean isDatetime(RelDataType fieldType) {
// Check standard SQL time types
if (SqlTypeUtil.isDatetime(fieldType)) {
return true;
}

// Check for OpenSearch UDT types (EXPR_TIMESTAMP mapped to VARCHAR)
if (isUserDefinedType(fieldType)) {
AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType;
ExprType udtType = exprType.getExprType();
return udtType == ExprCoreType.TIMESTAMP
|| udtType == ExprCoreType.DATE
|| udtType == ExprCoreType.TIME;
}

return false;
}

/**
* Checks whether a {@link RelDataType} represents a date type.
*
* <p>This method returns true for both Calcite's built-in {@link SqlTypeName#DATE} type and
* OpenSearch's user-defined date type {@link OpenSearchTypeFactory.ExprUDT#EXPR_DATE}.
*
* @param type the type to check
* @return true if the type is a date type (built-in or user-defined), false otherwise
*/
public static boolean isDate(RelDataType type) {
if (isUserDefinedType(type)) {
if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) {
return true;
}
}
return SqlTypeName.DATE.equals(type.getSqlTypeName());
}

/**
* Checks whether a {@link RelDataType} represents a timestamp type.
*
* <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIMESTAMP} type and
* OpenSearch's user-defined timestamp type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIMESTAMP}.
*
* @param type the type to check
* @return true if the type is a timestamp type (built-in or user-defined), false otherwise
*/
public static boolean isTimestamp(RelDataType type) {
if (isUserDefinedType(type)) {
if (((AbstractExprRelDataType<?>) type).getUdt()
== OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP) {
return true;
}
}
return SqlTypeName.TIMESTAMP.equals(type.getSqlTypeName());
}

/**
* Checks whether a {@link RelDataType} represents a time type.
*
* <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIME} type and
* OpenSearch's user-defined time type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIME}.
*
* @param type the type to check
* @return true if the type is a time type (built-in or user-defined), false otherwise
*/
public static boolean isTime(RelDataType type) {
if (isUserDefinedType(type)) {
if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_TIME) {
return true;
}
}
return SqlTypeName.TIME.equals(type.getSqlTypeName());
}

/**
* This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because
* user-defined types also have VARCHAR as their SqlTypeName.
*/
public static boolean isCharacter(RelDataType type) {
return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type);
}

/**
* Checks whether a {@link RelDataType} represents an IP address type.
*
* <p>This method returns true only for OpenSearch's user-defined IP type {@link
* OpenSearchTypeFactory.ExprUDT#EXPR_IP}.
*
* @param type the type to check
* @return true if the type is an IP address type, false otherwise
*/
public static boolean isIp(RelDataType type) {
if (isUserDefinedType(type)) {
return ((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_IP;
}
return false;
}

/**
* Checks whether a {@link RelDataType} represents a binary type.
*
* <p>This method returns true for both Calcite's built-in binary types (BINARY, VARBINARY) and
* OpenSearch's user-defined binary type {@link OpenSearchTypeFactory.ExprUDT#EXPR_BINARY}.
*
* @param type the type to check
* @return true if the type is a binary type (built-in or user-defined), false otherwise
*/
public static boolean isBinary(RelDataType type) {
if (isUserDefinedType(type)) {
return ((AbstractExprRelDataType<?>) type).getUdt()
== OpenSearchTypeFactory.ExprUDT.EXPR_BINARY;
}
return SqlTypeName.BINARY_TYPES.contains(type.getSqlTypeName());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for test files related to OpenSearchTypeUtil
find . -type f -name "*OpenSearchTypeUtil*Test*.java" 2>/dev/null | head -20

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Also search for references to OpenSearchTypeUtil in test files
rg -l "OpenSearchTypeUtil" --type java -g "*Test*.java" 2>/dev/null

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check if the main file exists in the context provided
ls -la core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java 2>/dev/null || echo "File not found in repository"

Repository: opensearch-project/sql

Length of output: 190


🏁 Script executed:

# Search for method references in any test files
rg "isUserDefinedType|isNumericOrCharacter|isDatetime|isDate|isTimestamp|isTime|isCharacter|isIp|isBinary" --type java -g "*Test.java" | head -30

Repository: opensearch-project/sql

Length of output: 1197


🏁 Script executed:

# Also search without the Test suffix to catch integration tests (*IT.java)
rg "isUserDefinedType|isNumericOrCharacter|isDatetime|isDate|isTimestamp|isTime|isCharacter|isIp|isBinary" --type java -g "*IT.java" | head -30

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check for any test files in the calcite/utils directory
find core/src/test -type f -path "*calcite/utils*" -name "*.java" 2>/dev/null

Repository: opensearch-project/sql

Length of output: 288


🏁 Script executed:

# Search for any direct callers of OpenSearchTypeUtil in the codebase
rg "OpenSearchTypeUtil\." --type java -g "!*Test.java" -g "!*IT.java" | wc -l

Repository: opensearch-project/sql

Length of output: 68


🏁 Script executed:

# Search for imports of OpenSearchTypeUtil in test files (could be indirect testing)
rg "import.*OpenSearchTypeUtil" --type java | grep -i test

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Check for files that call OpenSearchTypeUtil methods in tests
rg "OpenSearchTypeUtil\." --type java | grep -i "test" | head -20

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

# Get a list of test files that might indirectly test these utilities
fd -e java -t f "Test.java$" "core/src/test" | xargs grep -l "isUserDefinedType\|isNumericOrCharacter\|isDatetime\|isDate\|isTimestamp\|isTime\|isCharacter\|isIp\|isBinary" 2>/dev/null

Repository: opensearch-project/sql

Length of output: 50


Create comprehensive unit tests for OpenSearchTypeUtil utility methods.

Unit tests are mandatory per coding guidelines for all new functions. This utility class contains 9 public static methods with no dedicated test coverage. Create OpenSearchTypeUtilTest.java in core/src/test/java/org/opensearch/sql/calcite/utils/ with test cases covering:

  • isUserDefinedType() – verify UDT detection and non-UDT handling
  • isNumericOrCharacter() – test both SQL numeric/character types and UDT numeric types
  • isDatetime() – verify SQL time types and UDT time types (TIMESTAMP, DATE, TIME)
  • isDate(), isTimestamp(), isTime() – test built-in and UDT variants for each type
  • isCharacter() – verify non-UDT VARCHAR/CHAR detection only
  • isIp() – test UDT IP type detection
  • isBinary() – test both SQL binary types and UDT binary type

Include edge cases: null inputs (once null handling is added), mixed type scenarios, and both code paths (built-in vs. UDT) for each method per Calcite integration test requirements.

🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
(around lines 22-178), there are nine public static helpers with no unit tests;
create
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtilTest.java
and implement focused JUnit tests that validate: isUserDefinedType() for both
UDT and non-UDT RelDataType instances; isNumericOrCharacter() for Calcite
numeric/character types and for AbstractExprRelDataType UDT numeric types;
isDatetime() for Calcite datetime types and UDT TIMESTAMP/DATE/TIME; isDate(),
isTimestamp(), isTime() each for built-in SqlTypeName and corresponding
OpenSearchTypeFactory.ExprUDT variants; isCharacter() ensuring it returns true
only for non-UDT VARCHAR/CHAR; isIp() for UDT EXPR_IP; isBinary() for
SqlTypeName binary types and UDT EXPR_BINARY; include mixed-type scenarios and
edge-case assertions (null inputs handling should be tested after adding
null-safety in the production methods — if null-safety is not present, assert
expected NullPointerException to reflect current behavior), and construct UDT
types by instantiating or mocking AbstractExprRelDataType and
OpenSearchTypeFactory.ExprUDT as needed so both code paths (built-in vs UDT) are
covered.

Comment on lines 29 to 177
public static boolean isUserDefinedType(RelDataType type) {
return type instanceof AbstractExprRelDataType<?>;
}

/**
* Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types
* (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric
* types, and string types (VARCHAR, CHAR).
*
* @param fieldType the RelDataType to check
* @return true if the type is numeric or string, false otherwise
*/
public static boolean isNumericOrCharacter(RelDataType fieldType) {
// Check standard SQL numeric types & string types (VARCHAR, CHAR)
if (SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType)) {
return true;
}

// Check for OpenSearch UDT numeric types
if (isUserDefinedType(fieldType)) {
AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType;
ExprType udtType = exprType.getExprType();
return ExprCoreType.numberTypes().contains(udtType);
}

return false;
}

/**
* Checks if the RelDataType represents a time-based field (timestamp, date, or time). Supports
* both standard SQL time types (including TIMESTAMP, TIMESTAMP_WITH_LOCAL_TIME_ZONE, DATE, TIME,
* and their timezone variants) and OpenSearch UDT time types.
*
* @param fieldType the RelDataType to check
* @return true if the type is time-based, false otherwise
*/
public static boolean isDatetime(RelDataType fieldType) {
// Check standard SQL time types
if (SqlTypeUtil.isDatetime(fieldType)) {
return true;
}

// Check for OpenSearch UDT types (EXPR_TIMESTAMP mapped to VARCHAR)
if (isUserDefinedType(fieldType)) {
AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType;
ExprType udtType = exprType.getExprType();
return udtType == ExprCoreType.TIMESTAMP
|| udtType == ExprCoreType.DATE
|| udtType == ExprCoreType.TIME;
}

return false;
}

/**
* Checks whether a {@link RelDataType} represents a date type.
*
* <p>This method returns true for both Calcite's built-in {@link SqlTypeName#DATE} type and
* OpenSearch's user-defined date type {@link OpenSearchTypeFactory.ExprUDT#EXPR_DATE}.
*
* @param type the type to check
* @return true if the type is a date type (built-in or user-defined), false otherwise
*/
public static boolean isDate(RelDataType type) {
if (isUserDefinedType(type)) {
if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) {
return true;
}
}
return SqlTypeName.DATE.equals(type.getSqlTypeName());
}

/**
* Checks whether a {@link RelDataType} represents a timestamp type.
*
* <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIMESTAMP} type and
* OpenSearch's user-defined timestamp type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIMESTAMP}.
*
* @param type the type to check
* @return true if the type is a timestamp type (built-in or user-defined), false otherwise
*/
public static boolean isTimestamp(RelDataType type) {
if (isUserDefinedType(type)) {
if (((AbstractExprRelDataType<?>) type).getUdt()
== OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP) {
return true;
}
}
return SqlTypeName.TIMESTAMP.equals(type.getSqlTypeName());
}

/**
* Checks whether a {@link RelDataType} represents a time type.
*
* <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIME} type and
* OpenSearch's user-defined time type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIME}.
*
* @param type the type to check
* @return true if the type is a time type (built-in or user-defined), false otherwise
*/
public static boolean isTime(RelDataType type) {
if (isUserDefinedType(type)) {
if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_TIME) {
return true;
}
}
return SqlTypeName.TIME.equals(type.getSqlTypeName());
}

/**
* This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because
* user-defined types also have VARCHAR as their SqlTypeName.
*/
public static boolean isCharacter(RelDataType type) {
return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type);
}

/**
* Checks whether a {@link RelDataType} represents an IP address type.
*
* <p>This method returns true only for OpenSearch's user-defined IP type {@link
* OpenSearchTypeFactory.ExprUDT#EXPR_IP}.
*
* @param type the type to check
* @return true if the type is an IP address type, false otherwise
*/
public static boolean isIp(RelDataType type) {
if (isUserDefinedType(type)) {
return ((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_IP;
}
return false;
}

/**
* Checks whether a {@link RelDataType} represents a binary type.
*
* <p>This method returns true for both Calcite's built-in binary types (BINARY, VARBINARY) and
* OpenSearch's user-defined binary type {@link OpenSearchTypeFactory.ExprUDT#EXPR_BINARY}.
*
* @param type the type to check
* @return true if the type is a binary type (built-in or user-defined), false otherwise
*/
public static boolean isBinary(RelDataType type) {
if (isUserDefinedType(type)) {
return ((AbstractExprRelDataType<?>) type).getUdt()
== OpenSearchTypeFactory.ExprUDT.EXPR_BINARY;
}
return SqlTypeName.BINARY_TYPES.contains(type.getSqlTypeName());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add null parameter validation to public methods.

All public methods accept RelDataType parameters but perform no null checks. Since this is a new public utility API, consider adding defensive null checks or documenting the non-null requirement with annotations.

🔎 Example fix for null validation

Option 1: Add explicit null checks with meaningful exceptions:

 public static boolean isUserDefinedType(RelDataType type) {
+  if (type == null) {
+    throw new IllegalArgumentException("RelDataType parameter cannot be null");
+  }
   return type instanceof AbstractExprRelDataType<?>;
 }

Option 2: Use Lombok's @NonNull annotation (requires adding Lombok dependency if not already present):

+import lombok.NonNull;
+
-public static boolean isUserDefinedType(RelDataType type) {
+public static boolean isUserDefinedType(@NonNull RelDataType type) {
   return type instanceof AbstractExprRelDataType<?>;
 }

Apply similar changes to all other public methods in this class.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
around lines 29 to 177, all public methods that accept RelDataType
(isUserDefinedType, isNumericOrCharacter, isDatetime, isDate, isTimestamp,
isTime, isCharacter, isIp, isBinary) lack null validation; add defensive null
checks at the start of each public method (e.g., Objects.requireNonNull(type,
"type must not be null") or throw a clear NPE with a descriptive message) or
annotate parameters with a non-null annotation if your codebase prefers that
style, ensuring consistency across all methods and corresponding unit tests are
updated to expect the NPE/annotation contract.

Comment on lines +138 to +144
/**
* This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because
* user-defined types also have VARCHAR as their SqlTypeName.
*/
public static boolean isCharacter(RelDataType type) {
return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Improve JavaDoc formatting and grammar.

The isCharacter method uses a comment-style documentation instead of the JavaDoc format used by other methods. Additionally, "in place for" should be "in place of".

🔎 Proposed fix
-  /**
-   * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because
-   * user-defined types also have VARCHAR as their SqlTypeName.
-   */
+  /**
+   * Checks whether a {@link RelDataType} represents a character type.
+   *
+   * <p>This method should be used in place of {@link SqlTypeUtil#isCharacter(RelDataType)} because
+   * user-defined types also have VARCHAR as their SqlTypeName but are not character types.
+   *
+   * @param type the type to check
+   * @return true if the type is a character type (excluding UDTs), false otherwise
+   */
   public static boolean isCharacter(RelDataType type) {
     return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type);
   }
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
around lines 138 to 144, replace the comment-style documentation with proper
JavaDoc and fix the grammar: change "in place for" to "in place of", keep the
{@link SqlTypeUtil#isCharacter(RelDataType)} reference, and ensure the JavaDoc
sentence clearly explains that this method should be used instead of
SqlTypeUtil.isCharacter(...) because user-defined types also have VARCHAR as
their SqlTypeName; update punctuation and formatting to match surrounding method
docs.

Comment on lines +511 to 514
protected void registerOperator(BuiltinFunctionName functionName, SqlOperator operator) {
register(
functionName,
(RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args),
typeChecker);
functionName, (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add null validation for operator parameter.

The registerOperator method should validate that the operator parameter is not null before creating the lambda. This prevents cryptic NPEs during function resolution and provides clearer error messages at registration time.

🔎 Proposed null check
 protected void registerOperator(BuiltinFunctionName functionName, SqlOperator operator) {
+  if (operator == null) {
+    throw new IllegalArgumentException(
+        String.format("Operator cannot be null for function %s", functionName));
+  }
   register(
       functionName, (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args));
 }
🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
around lines 511 to 514, the registerOperator method currently accepts a
potentially null SqlOperator which can cause cryptic NPEs later; add an explicit
null check for the operator parameter and throw an IllegalArgumentException (or
Objects.requireNonNull with a clear message) before calling register so the
error is raised at registration time with a descriptive message.

Comment on lines +842 to +853
// Register ADD (+ symbol) for string concatenation and numeric addition
// Not creating PPL builtin operator as it will cause confusion during function resolution
FunctionImp add =
(builder, args) -> {
SqlOperator op =
(Stream.of(args).map(RexNode::getType).allMatch(OpenSearchTypeUtil::isCharacter))
? SqlStdOperatorTable.CONCAT
: SqlStdOperatorTable.PLUS;
return builder.makeCall(op, args);
};
register(ADD, add);
register(ADDFUNCTION, add);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the PPLFuncImpTable file and examine resolve method calls
find . -name "PPLFuncImpTable.java" -type f 2>/dev/null | head -5

Repository: opensearch-project/sql

Length of output: 146


🏁 Script executed:

#!/bin/bash
# Search for resolve() method calls on FunctionImp
rg -n "\.resolve\(" --type=java -g "**/calcite/**" -A 3 -B 3 | head -100

# Search for similar type checking patterns with Stream
rg -n "Stream.*allMatch.*getType" --type=java | head -50

# Search for null type handling in expression implementations
rg -n "getType\(\).*=.*null\|null.*getType\(\)" --type=java -g "**/expression/**" | head -50

Repository: opensearch-project/sql

Length of output: 11845


🏁 Script executed:

#!/bin/bash
# Get the PPLFuncImpTable.resolve method signature
sed -n '1,100p' ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

# Search for FunctionImp interface definition
rg -n "interface FunctionImp\|class.*FunctionImp" --type=java -A 10

# Check for Stream.allMatch with empty args handling
rg -n "allMatch.*getType\|getType.*allMatch" --type=java -B 2 -A 2 | head -50

Repository: opensearch-project/sql

Length of output: 7943


🏁 Script executed:

#!/bin/bash
# Look at the PPLFuncImpTable file around lines 842-853 for ADD function
sed -n '800,900p' ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

# Also search for the resolve method signature
rg -n "public.*resolve\(" ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java -A 5

# Find how OpenSearchTypeUtil.isCharacter is implemented
rg -n "isCharacter" --type=java -B 2 -A 5

Repository: opensearch-project/sql

Length of output: 19671


🏁 Script executed:

#!/bin/bash
# Check context where resolve is called - look for test patterns
rg -n "resolve.*ADD\|resolve.*ADDFUNCTION" --type=java -B 3 -A 3 | head -60

Repository: opensearch-project/sql

Length of output: 48


Add defensive guard for empty args and document the dynamic operator selection pattern.

The type-based dispatch for ADD/ADDFUNCTION works correctly in practice (types are guaranteed post-Calcite validation), but needs defensive handling and documentation:

  • Stream.of(args).allMatch() returns true for empty args, incorrectly selecting CONCAT instead of PLUS
  • No documentation explaining the Calcite-specific dynamic dispatch pattern (required by coding guidelines)

Add an empty check and Calcite-specific documentation:

FunctionImp add =
    (builder, args) -> {
      // In Calcite, dynamic operator selection at plan-build time based on validated argument types.
      // All RexNodes have types post-validation; allMatch returns true on empty stream, so guard it.
      SqlOperator op = (args.length > 0 
          && Stream.of(args).map(RexNode::getType).allMatch(OpenSearchTypeUtil::isCharacter))
          ? SqlStdOperatorTable.CONCAT
          : SqlStdOperatorTable.PLUS;
      return builder.makeCall(op, args);
    };
🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
around lines 842 to 853, the ADD/ADDFUNCTION implementation uses
Stream.of(args).allMatch(...) which returns true for empty args and thus wrongly
picks CONCAT; add a defensive guard to check args.length > 0 before evaluating
allMatch and update the inline comment to document the Calcite-specific dynamic
operator selection (types are available post-Calcite validation and allMatch on
empty streams yields true), so choose CONCAT only when args.length > 0 and all
argument types are character, otherwise select PLUS.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)

131-141: Operand count validation doesn't respect throwOnFailure parameter.

While the fix now correctly passes throwOnFailure to checkSingleOperandType at line 140, the operand count validation at line 133-134 still doesn't respect the throwOnFailure parameter. When the operand count is invalid and throwOnFailure is true, the method should throw a validation exception rather than silently returning false.

🔎 Proposed fix
          @Override
          public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
            if (!getOperandCountRange().isValidCount(callBinding.getOperandCount())) {
+             if (throwOnFailure) {
+               throw callBinding.newValidationSignatureError();
+             }
              return false;
            }
            return IntStream.range(0, 2)
                .allMatch(
                    i ->
                        checkSingleOperandType(
                            callBinding, callBinding.operand(i), i, throwOnFailure));
          }
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)

48-81: Operand count validation doesn't enforce odd argument count requirement.

The current implementation improves upon the previous version by enforcing a minimum of 3 arguments (line 72). However, it still doesn't validate that the total argument count is odd, which is required by the function's contract (json_string + path/value pairs).

With an even argument count (e.g., 4 arguments), the validation at lines 56-68 would pass (checking only arg[1] as a path), but the runtime eval method would fail at line 104 when trying to access keys.get(i + 1) for a missing value argument.

🔎 Proposed fix
          @Override
          public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
            final int count = callBinding.getOperandCount();
+           // Validate odd argument count (json_string + path/value pairs)
+           if (count % 2 == 0) {
+             if (throwOnFailure) {
+               throw callBinding.newValidationSignatureError();
+             }
+             return false;
+           }
            for (int i = 1; i < count; i += 2) {
              RelDataType nameType = callBinding.getOperandType(i);
              if (!OpenSearchTypeUtil.isCharacter(nameType)) {
                if (throwOnFailure) {
                  throw callBinding.newError(RESOURCE.expectedCharacter());
                }
                return false;
              }
            }
            return true;
          }
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)

62-152: Consider documenting the arbitrary additional arguments behavior.

The implementation allows arbitrary arguments after the first two (array and function), as noted in the comment at line 63. However, the runtime behavior in the eval method (lines 195-208) currently only supports one captured variable. The limitation is documented in comments at lines 197-202, but users might not be aware of this constraint when calling the function.

Consider adding JavaDoc to the class or method level explaining:

  1. The function accepts additional arguments as captured variables for the lambda
  2. Currently only the first captured variable is fully supported
  3. The behavior difference between Function1 (no captures/index) and Function2 (captures or index)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)

84-90: Consider instance-scoped dependency injection over static mutable state.

The static operatorTableProvider field with a public setter creates global mutable state that could complicate testing (tests must coordinate access to shared state) and scenarios where different contexts need different operator tables. While this pattern breaks circular dependencies as noted in the PR objectives, consider whether you could pass the provider via the create() factory method or constructor instead. If the static approach is necessary, document the initialization contract and consider thread-safety if setOperatorTableProvider could be called concurrently.

Alternative instance-based approach
  @Setter private static SqlOperatorTableProvider operatorTableProvider;
+  private final SqlOperatorTableProvider instanceOperatorTableProvider;

  private CalcitePlanContext(FrameworkConfig config, SysLimit sysLimit, QueryType queryType) {
+   this(config, sysLimit, queryType, operatorTableProvider);
+ }
+
+ private CalcitePlanContext(FrameworkConfig config, SysLimit sysLimit, QueryType queryType, SqlOperatorTableProvider provider) {
    this.config = config;
    this.sysLimit = sysLimit;
    this.queryType = queryType;
+   this.instanceOperatorTableProvider = provider;
    // ... rest of initialization

Then use instanceOperatorTableProvider in getValidator() instead of the static field.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 229ddb7 and e6b80e5.

📒 Files selected for processing (4)
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
core/src/main/java/org/opensearch/sql/expression/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns

  • Type handling must be consistent with project type system
  • Error handling must use appropriate exception types
  • Null handling must be explicit and documented

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java
  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
  • core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧬 Code graph analysis (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.java (1)
  • LambdaUtils (11-29)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (5)
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java (1)
  • CalciteToolsHelper (99-379)
core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)
  • OpenSearchSparkSqlDialect (28-113)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (1)
  • PplTypeCoercion (43-203)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)
  • PplTypeCoercionRule (45-107)
core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java (1)
  • PplValidator (50-205)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java (1)

91-101: LGTM! Empty allowedParamTypes case is now handled correctly.

The previous review concern about invalid operand count range when allowedParamTypes is empty has been properly addressed. The code now checks for null/empty at line 92 and returns a valid range of (0,0) before computing min/max.

core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)

56-68: Ensure comprehensive validation before runtime execution.

The operand type validation correctly checks that path arguments are character types, which prevents type mismatches. However, this validation should be paired with the odd-count check (flagged separately) to ensure all runtime preconditions are validated during the validation phase rather than failing later in eval().

Comment on lines +129 to +168
public SqlValidator getValidator() {
if (validator == null) {
synchronized (this) {
// Double-Checked Locking for thread-safety
if (validator == null) {
final CalciteServerStatement statement;
try {
statement = connection.createStatement().unwrap(CalciteServerStatement.class);
} catch (SQLException e) {
throw new RuntimeException(e);
}
if (operatorTableProvider == null) {
throw new IllegalStateException(
"SqlOperatorTableProvider must be set before creating CalcitePlanContext");
}
SqlValidator.Config validatorConfig =
SqlValidator.Config.DEFAULT
.withTypeCoercionRules(PplTypeCoercionRule.instance())
.withTypeCoercionFactory(PplTypeCoercion::create)
// Use lenient conformance for PPL compatibility
.withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance())
// Use Spark SQL's NULL collation (NULLs sorted LOW/FIRST)
.withDefaultNullCollation(NullCollation.LOW)
// This ensures that coerced arguments are replaced with cast version in sql
// select list because coercion is performed during select list expansion during
// sql validation. Affects 4356.yml
// See SqlValidatorImpl#validateSelectList and AggConverter#translateAgg
.withIdentifierExpansion(true);
validator =
PplValidator.create(
statement,
config,
operatorTableProvider.getOperatorTable(),
TYPE_FACTORY,
validatorConfig);
}
}
}
return validator;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Find CalciteServerStatement in the codebase
echo "=== Searching for CalciteServerStatement ==="
find . -type f -name "*.java" | xargs grep -l "CalciteServerStatement" | head -20

# Look for the file and class definition
echo -e "\n=== Checking if CalciteServerStatement is from Calcite library ==="
grep -r "class CalciteServerStatement\|interface CalciteServerStatement" --include="*.java" 2>/dev/null | head -5

Repository: opensearch-project/sql

Length of output: 403


🏁 Script executed:

#!/bin/bash

# Find CalcitePlanContext to examine the actual implementation
echo "=== Finding CalcitePlanContext file ==="
find . -type f -name "CalcitePlanContext.java"

# Get the complete file content to understand the context
echo -e "\n=== CalcitePlanContext content ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java

Repository: opensearch-project/sql

Length of output: 11109


🏁 Script executed:

#!/bin/bash

# Search for PplValidator.create to understand how statement is used
echo "=== Searching for PplValidator.create ==="
find . -type f -name "*.java" | xargs grep -A 20 "PplValidator.create\|class PplValidator" | head -100

Repository: opensearch-project/sql

Length of output: 6457


🏁 Script executed:

#!/bin/bash

# Check how CalciteServerStatement is used elsewhere in the codebase
echo "=== CalciteServerStatement usage patterns ==="
find . -type f -name "*.java" -exec grep -H "CalciteServerStatement\|createStatement" {} \; | grep -v "\.class" | head -50

Repository: opensearch-project/sql

Length of output: 3384


🏁 Script executed:

#!/bin/bash

# Get the complete CalciteToolsHelper file
echo "=== CalciteToolsHelper content ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java

Repository: opensearch-project/sql

Length of output: 14929


🏁 Script executed:

#!/bin/bash

# Also check PplValidator to see if it stores statement reference
echo "=== PplValidator.java (full file) ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java

Repository: opensearch-project/sql

Length of output: 8498


🏁 Script executed:

#!/bin/bash

# Check how CalcitePlanContext is created and used - who creates it?
echo "=== Searching for CalcitePlanContext.create() callers ==="
rg -B2 -A2 "CalcitePlanContext\.create\|new CalcitePlanContext" --type java | head -100

Repository: opensearch-project/sql

Length of output: 123


🏁 Script executed:

#!/bin/bash

# Search for CalcitePlanContext usage in test and main files
echo "=== CalcitePlanContext usage in codebase ==="
rg "CalcitePlanContext" --type java -B 1 -A 1 | head -150

Repository: opensearch-project/sql

Length of output: 15090


🏁 Script executed:

#!/bin/bash

# Look for unit tests of CalcitePlanContext to understand lifecycle
echo "=== CalcitePlanContext test files ==="
find . -type f -name "*CalcitePlanContext*Test*.java" -o -name "*Test*CalcitePlanContext*.java"

Repository: opensearch-project/sql

Length of output: 103


🏁 Script executed:

#!/antml:function_calls
# Check if there is a close() method being called on CalcitePlanContext anywhere
echo "=== Searching for CalcitePlanContext close/cleanup ==="
rg "\.close\(\)|try.*CalcitePlanContext|AutoCloseable.*CalcitePlanContext" --type java

Repository: opensearch-project/sql

Length of output: 10796


🏁 Script executed:

#!/bin/bash

# Check how withPrepare is used - does it handle statement cleanup?
echo "=== CalciteToolsHelper.withPrepare usage and statement handling ==="
rg "withPrepare\|perform" core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java -B 2 -A 5

# Check if there's any statement closing elsewhere in related code
echo -e "\n=== Searching for statement management patterns ==="
rg "createStatement\(\)" --type java -B 2 -A 5 core/src/main/java/org/opensearch/sql/calcite/ | head -80

Repository: opensearch-project/sql

Length of output: 1995


🏁 Script executed:

#!/bin/bash

# Verify CalciteServerStatement is from Calcite library and check its inheritance
echo "=== Check Calcite imports for CalciteServerStatement ==="
grep -r "import.*CalciteServerStatement" core/src/main/java/org/opensearch/sql/calcite/ --include="*.java"

# Look at how statements created elsewhere are managed (e.g., in tests that use try-with-resources)
echo -e "\n=== Statement resource patterns in tests ==="
rg "try.*createStatement\|createStatement.*try" --type java -B 1 -A 3 integ-test/ | head -50

Repository: opensearch-project/sql

Length of output: 558


Address resource leak: Close the statement after validator creation.

The underlying Statement resource created by connection.createStatement() is not being closed. Although CalciteServerStatement itself doesn't implement AutoCloseable, it wraps a JDBC Statement that does hold resources requiring proper cleanup.

The statement is only needed to extract the prepare context via statement.createPrepareContext(). Since it's not stored or reused, close it immediately after validator initialization:

final CalciteServerStatement statement;
try {
  statement = connection.createStatement().unwrap(CalciteServerStatement.class);
  if (operatorTableProvider == null) {
    throw new IllegalStateException(
        "SqlOperatorTableProvider must be set before creating CalcitePlanContext");
  }
  SqlValidator.Config validatorConfig =
      SqlValidator.Config.DEFAULT
          .withTypeCoercionRules(PplTypeCoercionRule.instance())
          .withTypeCoercionFactory(PplTypeCoercion::create)
          .withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance())
          .withDefaultNullCollation(NullCollation.LOW)
          .withIdentifierExpansion(true);
  validator =
      PplValidator.create(
          statement,
          config,
          operatorTableProvider.getOperatorTable(),
          TYPE_FACTORY,
          validatorConfig);
} catch (SQLException e) {
  throw new RuntimeException(e);
} finally {
  if (statement != null) {
    try {
      statement.close();
    } catch (SQLException e) {
      // log or handle exception
    }
  }
}

Also check CalciteToolsHelper.withPrepare() for the same pattern—it creates a statement at line 137 that should be closed after OpenSearchPrepareImpl().perform() completes.

The double-checked locking implementation with volatile is correct.

🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129-168, the Statement created via
connection.createStatement().unwrap(CalciteServerStatement.class) is never
closed causing a resource leak; after using the statement to create the
validator, ensure the underlying JDBC Statement is closed (e.g., wrap
creation/use in try/finally or try-with-resources so statement.close() is always
called and handle/log any SQLException from close) and keep the existing
exception handling for creation; also review CalciteToolsHelper.withPrepare()
(around line 137) where a statement is created and ensure that statement is
likewise closed after OpenSearchPrepareImpl().perform() completes.

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)

138-144: JavaDoc grammar issue persists.

The phrase "in place for" on line 139 should be "in place of" as noted in a previous review. Additionally, the JavaDoc format differs from other methods in this class.

🔎 Proposed fix
   /**
-   * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because
-   * user-defined types also have VARCHAR as their SqlTypeName.
+   * Checks whether a {@link RelDataType} represents a character type.
+   *
+   * <p>This method should be used in place of {@link SqlTypeUtil#isCharacter(RelDataType)} because
+   * user-defined types also have VARCHAR as their SqlTypeName but are not character types.
+   *
+   * @param type the type to check
+   * @return true if the type is a character type (excluding UDTs), false otherwise
    */
   public static boolean isCharacter(RelDataType type) {
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java (1)

57-79: Signature string doesn't reflect all accepted operand types.

The getAllowedSignatures method returns "CIDRMATCH(<IP>, <STRING>)" but the type checker also accepts (STRING, STRING) via OperandTypes.CHARACTER_CHARACTER.or(...). Consider updating the signature to reflect both accepted forms.

🔎 Proposed fix
               @Override
               public String getAllowedSignatures(SqlOperator op, String opName) {
-                return "CIDRMATCH(<IP>, <STRING>)";
+                return "CIDRMATCH(<IP>, <STRING>), CIDRMATCH(<STRING>, <STRING>)";
               }
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)

492-509: Consider using createCall instead of mutating the SqlCall in place.

The rewriteCall implementation mutates the SqlBasicCall via setOperator(op) before returning it. While this pattern is consistent with similar operator rewriting in PplConvertletTable, creating a new call via createCall would be more explicit and safer. The codebase uses createCall for similar transformation scenarios elsewhere, so adopting this approach would improve consistency.

         @Override
         public SqlNode rewriteCall(SqlValidator validator, SqlCall call) {
           SqlOperator op =
               call.getOperandList().size() == 2
                   ? SqlStdOperatorTable.ATAN2
                   : SqlStdOperatorTable.ATAN;
-          ((SqlBasicCall) call).setOperator(op);
-          return call;
+          return op.createCall(call.getParserPosition(), call.getOperandList());
         }
       };
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6b80e5 and cb78ea7.

📒 Files selected for processing (7)
  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
core/src/main/java/org/opensearch/sql/expression/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns

  • Type handling must be consistent with project type system
  • Error handling must use appropriate exception types
  • Null handling must be explicit and documented

Files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
  • core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : All new business logic requires unit tests

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Encourage comprehensive test coverage in code reviews

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Prefer `Optional<T>` for nullable returns in Java

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Validate all user inputs, especially queries

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
  • OpenSearchTypeFactory (62-394)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (8)
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (2)

242-293: Well-structured custom type checkers for scalar validation.

The SCALAR and SCALAR_OPTIONAL_INTEGER implementations correctly delegate to OpenSearchTypeUtil.isScalar and follow the established pattern for custom SqlOperandTypeChecker implementations. The operand count validation before type checking is appropriate defensive programming.


101-156: Good documentation of Calcite schema differences.

The inline comments explaining the difference between OpenSearch type system (INTERVAL as STRING) and Calcite SCOTT schema behavior are helpful for maintainability. This follows the coding guideline to document Calcite-specific workarounds.

core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java (1)

50-71: Clean implementation of IP function operand validation.

The custom SqlOperandTypeChecker correctly validates that the single operand is either an IP or character type, and properly uses the throwOnFailure parameter to throw a validation signature error when needed. The signature string accurately reflects both accepted input types.

core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java (1)

26-62: Clean API simplification.

The refactored wrap method provides a simpler interface that delegates all type checking to the wrapped SqlOperandTypeChecker. The UnsupportedOperationException for paramTypes and paramNames is appropriate since these methods are not used in the current context, and the comments make this clear.

core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (2)

216-224: Good explicit null handling in isScalar.

The isScalar method explicitly handles null input and documents it in the JavaDoc (@param type the type to check; may be null). This is appropriate for this method's use case. Note that other methods in this class don't have null handling, which is acceptable if callers guarantee non-null inputs, but inconsistent.


159-178: Well-documented Calcite workaround for IP type.

The JavaDoc clearly explains that SqlTypeName.OTHER is "borrowed" to represent IP type during validation because SqlTypeName.IP doesn't exist. This follows the coding guideline to document Calcite-specific workarounds. The overload pattern allows callers to opt-in to accepting OTHER when appropriate.

core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (2)

530-541: Parameter type narrowed from SqlOperator to SqlFunction.

The parameter type change from SqlOperator to SqlFunction is a narrowing change. Since this is a private method, the impact is limited to this class. This aligns with the broader refactoring converting operators to functions throughout the file.


510-514: Clear explanation for explicit SPARK function declaration.

The comment explains the design decision to explicitly declare SPARK dialect functions to avoid operator resolution issues with multiple overrides. This is helpful documentation for future maintainers.

Comment on lines +78 to +79
(OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING))
.or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent use of SqlTypeFamily.STRING vs SqlTypeFamily.CHARACTER.

Line 78 uses SqlTypeFamily.STRING while the rest of the file consistently uses SqlTypeFamily.CHARACTER for string types (e.g., lines 41, 65-66). SqlTypeFamily.STRING and SqlTypeFamily.CHARACTER are distinct in Calcite—CHARACTER is the standard family for CHAR/VARCHAR.

🔎 Proposed fix
   public static final UDFOperandMetadata NUMERIC_STRING_OR_STRING_STRING =
       UDFOperandMetadata.wrap(
-          (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING))
-              .or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING)));
+          (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.CHARACTER))
+              .or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER)));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING))
.or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING)));
(OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.CHARACTER))
.or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER)));
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
around lines 78 to 79, there's an inconsistent use of SqlTypeFamily.STRING;
replace SqlTypeFamily.STRING with SqlTypeFamily.CHARACTER so the operand family
call matches the rest of the file (use CHARACTER for CHAR/VARCHAR), ensuring all
string-type checks consistently use SqlTypeFamily.CHARACTER.

Comment on lines +127 to +147
return UDFOperandMetadata.wrap(
new SqlOperandTypeChecker() {
@Override
public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) {
if (!getOperandCountRange().isValidCount(callBinding.getOperandCount())) {
return false;
}
return OpenSearchTypeUtil.isIp(callBinding.getOperandType(0), true)
&& OpenSearchTypeUtil.isIp(callBinding.getOperandType(1), true);
}

@Override
public SqlOperandCountRange getOperandCountRange() {
return SqlOperandCountRanges.of(2);
}

@Override
public String getAllowedSignatures(SqlOperator op, String opName) {
return String.format(Locale.ROOT, "%s(<IP>, <IP>)", opName);
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "CompareIpFunction|compare.*IP.*STRING|GREATER_IP|LESS_IP" --type java -g "*Test*.java" -A 3 | head -60

Repository: opensearch-project/sql

Length of output: 48


🏁 Script executed:

fd -t f "*Test*.java" | xargs rg -l "Ip|IP" | head -20

Repository: opensearch-project/sql

Length of output: 1964


🏁 Script executed:

cat -n core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java

Repository: opensearch-project/sql

Length of output: 8471


🏁 Script executed:

find . -type f -name "OpenSearchTypeUtil.java" | head -5

Repository: opensearch-project/sql

Length of output: 143


🏁 Script executed:

rg -n "isIp.*acceptOther|public.*isIp.*boolean" --type java -A 15 | head -80

Repository: opensearch-project/sql

Length of output: 1941


Class-level documentation claims unsupported STRING operand signatures.

The class Javadoc (lines 42-47) documents signatures (IP, STRING) and (STRING, IP), but the type checker uses isIp(type, true) which only accepts:

  • User-defined EXPR_IP type
  • SqlTypeName.OTHER

It does not accept VARCHAR/STRING types. While the runtime implementation can convert String objects to ExprIpValue (lines 194-195), these conversions only occur after type validation, which rejects VARCHAR operands. The getAllowedSignatures method correctly reflects the actual type checker behavior.

Update the class Javadoc to remove the STRING signature claims, or modify the type checker to accept VARCHAR types if STRING support is intended.

🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
around lines 127 to 147, the runtime only accepts EXPR_IP/OTHER via
OpenSearchTypeUtil.isIp(type, true) and rejects VARCHAR/STRING, so update the
class Javadoc (earlier Javadoc lines ~42-47) to remove the documented (IP,
STRING) and (STRING, IP) signatures (or alternatively, if STRING support is
intended, change the type checker here to also accept
SqlTypeName.VARCHAR/CHAR/STRING and add tests); ensure the Javadoc signatures
match the actual type checker behavior and, if keeping STRING support, also
move/adjust validation so the runtime string-to-ExprIpValue conversion (around
lines 194-195) is reachable during type checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

calcite calcite migration releated enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Type Conversion - Leverage RelDataType from Calcite in V3 engine. [RFC] Implicit Type Coercion with Calcite

2 participants