-
Notifications
You must be signed in to change notification settings - Fork 180
Use Calcite's validation system for type checking & coercion #4892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbitRelease Notes
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughIntroduces a Calcite-backed validation and implicit type coercion pipeline for PPL by converting RelNode↔SqlNode around validation. Adds PPL-specific validator, coercion rules, converters/shuttles, and convertlets. Refactors type utilities/factories, operand metadata, and function registry. Updates numerous tests (JSON→YAML expectations) and docs. Adjusts window collation embedding and arithmetic/type derivation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant QueryService
participant RelBuilder as PPL RelNode
participant RelToSql as PplRelToSqlNodeConverter
participant Validator as PplValidator/TypeCoercion
participant SqlToRel as PplSqlToRelConverter
participant Optimizer as Planner/Rules
participant OpenSearch as DataStore
Client->>QueryService: executeWithCalcite/ explainWithCalcite(PPL)
QueryService->>RelBuilder: build initial RelNode
QueryService->>RelBuilder: apply PplRelToSqlRelShuttle (normalize literals)
QueryService->>RelToSql: convert RelNode -> SqlNode
QueryService->>Validator: validate SqlNode (implicit casts, UDT mapping)
alt Validation tolerant failure
Validator-->>QueryService: throws known issue
QueryService-->>QueryService: fallback to original RelNode
else Success
Validator-->>QueryService: validated SqlNode
QueryService->>SqlToRel: SqlNode -> RelNode (Spark SEMI/ANTI fixups)
end
QueryService->>Optimizer: optimize RelNode
Optimizer->>OpenSearch: pushdown (PROJECT/FILTER/AGG/SORT)
OpenSearch-->>QueryService: results/plan
QueryService-->>Client: response
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
e085f81 to
fc6dd27
Compare
…checking Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java # Conflicts: # core/src/main/java/org/opensearch/sql/executor/QueryService.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
… logics Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- 2 more ITs passed in PPLBuiltinFunctionIT Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- this fix testRand, where desrialization of sarg does not restore its type - todo: update the toRex in ExtendedRelJson to the align with the latest version Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…estamp; (time, timestamp) -> timestamp (1240/1599) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…2/1872) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
- allow type cast - rewrite call to sql compare to custom ip comapre Signed-off-by: Yuanchun Shen <yuanchu@amazon.com> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…1356/1599 | 1476/1915) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…d in mvindex's implementation (1580/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…iting (1579/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…pe hint Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
…e inference (1701/2015) Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
docs/user/ppl/functions/collection.md (1)
165-189: Multiple collection function examples disabled—add user guidance and timeline.Five examples across TRANSFORM and MVMAP sections (lambda-based functions) are disabled pending issue #4972. These appear to be core use cases:
- TRANSFORM: Both single-param (
x → x + 2) and two-param ((x, i) → x + i) examples disabled- MVMAP: All three examples disabled, including the nested-expression case
This leaves users with no working documentation examples for these functions. For now, consider adding a single prominent note at the beginning of each disabled section (rather than per-example) explaining the limitation and expected resolution timeline, or provide workaround patterns if feasible.
Also applies to: 817-863
docs/user/ppl/cmd/bin.md (1)
453-463: Expand context for disabled binning example with strings.Example 20 references issue #4973 but lacks explanation of the limitation. Consider adding a brief note describing why
binwith string fields is unsupported and suggesting an alternative if one exists, so users understand the constraint rather than assuming a documentation error.integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java (1)
40-758: Some YAML resource files are missing.ExplainIT.java references 58 YAML files, but only 56 exist in
integ-test/src/test/resources/expectedOutput/ppl/. The missing files are:
explain_filter_function_script_push.yamlexplain_filter_script_push.yamlBoth are referenced in test methods marked with
@Ignore("The serialized string is unstable because of function properties"), so these tests won't execute. Either create the YAML files for completeness when re-enabling these tests, or remove theloadExpectedPlan()calls from the ignored test methods.core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java (1)
21-21: Add proper JavaDoc to public class and methods.The coding guidelines require all public classes and methods to have proper JavaDoc. Please add:
- Class-level JavaDoc explaining the purpose of RelevanceQueryFunction
- JavaDoc for
getReturnTypeInference()with@returntag- Convert the comment block (lines 32-38) to proper JavaDoc format for
getOperandMetadata()with@returntagAs per coding guidelines, "All public classes and methods must have proper JavaDoc" and "Public methods MUST have JavaDoc with @param, @return, and @throws."
📝 Example JavaDoc structure
+/** + * User-defined function for relevance query operations. + * This function is only supported when pushed down to the backend. + */ public class RelevanceQueryFunction extends ImplementorUDF { public RelevanceQueryFunction() { super(new RelevanceQueryImplementor(), NullPolicy.ANY); } + /** + * Returns the type inference for this function. + * + * @return BOOLEAN type inference + */ @Override public SqlReturnTypeInference getReturnTypeInference() { return ReturnTypes.BOOLEAN; } - /* - * The first parameter is always required (either fields or query). + /** + * Defines the operand metadata for relevance query functions. + * + * The first parameter is always required (either fields or query). * The second parameter is query when fields are present, otherwise it's the first parameter. * Starting from the 3rd parameter (or 2nd when no fields), they are optional parameters for relevance queries. * Different query has different parameter set, which will be validated in dedicated query builder. * Query parameter is always required and cannot be null. + * + * @return operand metadata accepting 1-25 MAP-type parameters */ @Override public UDFOperandMetadata getOperandMetadata() {Also applies to: 28-30, 40-45
core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java (1)
466-490: Add ranking window functions to WINDOW_FUNC_MAPPING to support ROW_NUMBER, RANK, and DENSE_RANK in Calcite path.The current implementation of
visitWindowFunctionvalidates window functions againstBuiltinFunctionName.WINDOW_FUNC_MAPPING, which only includes aggregate window functions (MAX, MIN, AVG, COUNT, SUM, etc.). Ranking window functions like ROW_NUMBER, RANK, and DENSE_RANK are defined as enum constants inBuiltinFunctionNamebut are missing fromWINDOW_FUNC_MAPPING. This causes these functions to fail withUnsupportedOperationExceptionwhen used through the Calcite path, even thoughPlanUtils.makeOverhas handling for ROW_NUMBER.Add the missing ranking window functions to
WINDOW_FUNC_MAPPING:
row_number→ROW_NUMBERrank→RANKdense_rank→DENSE_RANKVerify that PplValidator or another upstream component does not already handle this validation separately before adding to avoid duplication.
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java (1)
243-247: Enhance JavaDoc with parameter and return value documentation.The method has
@Overrideannotation, makingpublicvisibility necessary to match the parent classCalcitePrepareImpl#getPreparingStmt. Per coding guidelines, add@paramtags forcontext,elementType,catalogReader,plannerand@returntag documenting theCalcitePreparingStmtobject returned.
🧹 Nitpick comments (60)
core/src/main/java/org/opensearch/sql/expression/function/udf/math/DivideFunction.java (1)
45-48: Consider adding JavaDoc for consistency with coding standards.While the method's purpose is clear, adding JavaDoc would align with the project's coding guidelines that require documentation for public methods.
📝 Suggested JavaDoc
+ /** + * Returns the SQL kind for this operator. + * + * @return SqlKind.DIVIDE indicating this is a division operation + */ @Override public SqlKind getKind() { return SqlKind.DIVIDE; }core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java (4)
59-65: Consider adding braces for single-line conditionals.The single-line
ifstatements without braces reduce readability and can lead to maintenance issues.Suggested improvement
if (function1 instanceof SqlCall call) { List<SqlNode> operands = call.getOperandList(); // The first argument is base (accumulator), while the second is from the array - if (!operands.isEmpty()) validator.setValidatedNodeType(operands.get(0), baseType); - if (operands.size() > 1 && elementType != null) - validator.setValidatedNodeType(operands.get(1), elementType); + if (!operands.isEmpty()) { + validator.setValidatedNodeType(operands.get(0), baseType); + } + if (operands.size() > 1 && elementType != null) { + validator.setValidatedNodeType(operands.get(1), elementType); + } }
71-74: Same style issue: add braces for the single-line conditional.Suggested improvement
if (function2 instanceof SqlCall call) { List<SqlNode> operands = call.getOperandList(); - if (!operands.isEmpty()) validator.setValidatedNodeType(operands.get(0), returnType); + if (!operands.isEmpty()) { + validator.setValidatedNodeType(operands.get(0), returnType); + } }
46-84: Missing JavaDoc for public method.As per coding guidelines, public methods should have JavaDoc with
@returndocumentation. This method has complex type inference logic that would benefit from documentation explaining the different binding paths.Suggested JavaDoc
+ /** + * Returns type inference logic for the reduce function. + * <p> + * For {@link RexCallBinding}, returns the last operand's type directly. + * For {@link SqlCallBinding}, derives the return type by manipulating validator's + * validated node types to properly infer lambda return types. + * + * @return the SQL return type inference function + */ @Override public SqlReturnTypeInference getReturnTypeInference() {
109-139: Consider improving type safety in eval method.While unchanged, this method uses raw types and unchecked casts. The
@SuppressWarnings("unchecked")annotation would document intentional unchecked operations, improving maintainability.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java (1)
24-33: Consider expanding test coverage for mixed-type scenarios.The test validates the core behavior of mixed-type IN predicates with Calcite. To strengthen test coverage and align with testing guidelines, consider adding test cases for:
- NULL values in the IN list
- Empty IN list
- Different numeric type combinations (long, double with string)
- All-string or all-numeric IN lists for comparison
- Boundary conditions (single value, many values)
Based on learnings, testing SQL generation and optimization paths for Calcite integration changes helps ensure correctness.
Example additional test cases
@Test public void testInWithNullInList() throws IOException { // Verify NULL handling in IN predicate JSONObject result = executeQuery( String.format("source=%s | where balance in (4180, null, '6077') | fields firstname", TEST_INDEX_ACCOUNT)); // Add appropriate assertions } @Test public void testInWithAllStringTypes() throws IOException { // Verify all-string IN predicate still works JSONObject result = executeQuery( String.format("source=%s | where balance in ('4180', '5686', '6077') | fields firstname", TEST_INDEX_ACCOUNT)); // Add appropriate assertions } @Test public void testInWithMixedNumericTypes() throws IOException { // Verify different numeric types coerce properly JSONObject result = executeQuery( String.format("source=%s | where balance in (4180, 5686.0, '6077') | fields firstname", TEST_INDEX_ACCOUNT)); // Add appropriate assertions }integ-test/src/test/resources/expectedOutput/calcite/explain_limit_agg_pushdown3.yaml (1)
1-15: LGTM! Comprehensive test case for aggregation and limit pushdown.This expected output YAML effectively tests a complex query plan involving:
- Multiple limit levels (system limit, pagination, inner fetch)
- Offset handling for pagination
- Aggregation pushdown to OpenSearch with composite buckets
- Edge cases like missing value handling (
missing_bucket: true,missing_order: first)The logical and physical plans are consistent, and the PushDownContext correctly reflects the optimizations (AGGREGATION and LIMIT pushdowns). The test data is realistic and representative of production query patterns.
Note: Line 15 is quite long (~800+ characters) due to the serialized PushDownContext with embedded JSON. While this makes manual reading challenging, it's appropriate for test fixtures where exact string matching is required.
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (6)
6-6: Package naming convention issue.The package name
CollectionUDFuses PascalCase, which violates Java naming conventions. Package names should be all lowercase (e.g.,collectionudforcollection.udf). This appears to be a pre-existing issue, but it's worth noting for future refactoring.
84-102: Unreachable switch cases.The
ANYandIGNOREcases in this switch statement are unreachable because thefamilieslist on line 66-67 only containsARRAYandFUNCTION. Consider removing this dead code to improve clarity, or document why it's preserved (e.g., for future extensibility).
62-150: Missing JavaDoc for the anonymous class methods.Per coding guidelines, public methods should have JavaDoc with
@param,@return, and@throws. The methods in this anonymousSqlSingleOperandTypeCheckerimplementation lack documentation. While the class-level comment referencesFamilyOperandTypeChecker, adding brief documentation forcheckOperandTypes,getOperandCountRange, andgetAllowedSignatureswould improve maintainability.
166-167: Unchecked cast without null/type validation.The cast
(List<Object>) args[0]could throw aClassCastExceptionifargs[0]is not aList. Consider adding a defensive check or documenting the contract that guaranteesargs[0]is always aList.🔎 Proposed defensive check
public static Object eval(Object... args) { - List<Object> target = (List<Object>) args[0]; + if (!(args[0] instanceof List)) { + throw new IllegalArgumentException("First argument must be a List"); + } + @SuppressWarnings("unchecked") + List<Object> target = (List<Object>) args[0]; List<Object> results = new ArrayList<>();
185-187: Broad exception handling loses context.Catching generic
Exceptionand re-throwing asRuntimeExceptiondiscards the original exception's type information. Consider catching more specific exceptions or preserving the exception chain with a descriptive message.🔎 Example improvement
} catch (Exception e) { - throw new RuntimeException(e); + throw new RuntimeException("Failed to apply lambda function to array element", e); }Also applies to: 215-217
46-49: Missing JavaDoc on public methods.Per coding guidelines, public methods should have JavaDoc with
@param,@return, and@throws. The methodsgetReturnTypeInference(),getOperandMetadata(), andeval()are public but lack documentation. Consider adding JavaDoc to improve maintainability.integ-test/src/test/resources/expectedOutput/calcite/explain_filter_with_search.yaml (1)
1-10: Test data covers basic range query; verify edge cases in companion tests.The test data uses realistic timestamp values ('2023-01-01' and '2023-01-03') for a basic range filter scenario. As per coding guidelines, ensure companion test files cover edge cases and boundary conditions such as:
- Null/missing timestamp values
- Timezone handling (the generated query uses UTC 'Z' timezone)
- Boundary inclusivity (>= vs >, < vs <=)
- Invalid/malformed timestamp strings (especially relevant given PR's CAST → SAFE_CAST changes)
- Single-point equality vs range queries
integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java (3)
154-159: Complete the migration to YAML-based assertions.This test still uses JSON format (
explain_count_agg_push.jsonandassertJsonEqualsIgnoreId) while most other tests have been migrated to YAML. For consistency with the PR's migration objectives, consider completing the migration to YAML format.🔎 Suggested migration to YAML
@Test public void testCountAggPushDownExplain() throws IOException { - String expected = loadExpectedPlan("explain_count_agg_push.json"); - assertJsonEqualsIgnoreId( + String expected = loadExpectedPlan("explain_count_agg_push.yaml"); + assertYamlEqualsIgnoreId( expected, - explainQueryToString("source=opensearch-sql_test_index_account | stats count() as cnt")); + explainQueryYaml("source=opensearch-sql_test_index_account | stats count() as cnt")); }Note: This assumes the corresponding YAML resource file exists or will be created.
192-198: Complete the migration to YAML-based assertions.This test still uses JSON format while most other tests have been migrated to YAML. For consistency, consider migrating this test to YAML format as well.
🔎 Suggested migration to YAML
@Test public void testSortWithTypePushDownExplain() throws IOException { - String expected = loadExpectedPlan("explain_sort_type_push.json"); - assertJsonEqualsIgnoreId( + String expected = loadExpectedPlan("explain_sort_type_push.yaml"); + assertYamlEqualsIgnoreId( expected, - explainQueryToString( + explainQueryYaml( "source=opensearch-sql_test_index_account | sort num(age) | fields age")); }Note: This assumes the corresponding YAML resource file exists or will be created.
201-213: Complete the migration to YAML-based assertions.This test still uses JSON format while most other tests have been migrated to YAML. For consistency with the broader migration in this PR, consider migrating this test to YAML format.
🔎 Suggested migration to YAML
@Test public void testSortWithAggregationExplain() throws IOException { // Sorts whose by fields are aggregators should not be pushed down - String expected = loadExpectedPlan("explain_sort_agg_push.json"); - assertJsonEqualsIgnoreId( + String expected = loadExpectedPlan("explain_sort_agg_push.yaml"); + assertYamlEqualsIgnoreId( expected, - explainQueryToString( + explainQueryYaml( "source=opensearch-sql_test_index_account" + "| stats avg(age) AS avg_age by state, city " + "| sort avg_age")); // sorts whose by fields are not aggregators can be pushed down. // This test is covered in testExplain }Note: This assumes the corresponding YAML resource file exists or will be created.
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_then_sort_push.yaml (1)
7-7: Consider maintainability of explicit field listings.The expanded
LogicalProjectexplicitly lists all 17 fields. While this provides full visibility for test validation, consider whether:
- This verbosity is necessary for the test's purpose (verifying limit-then-sort pushdown)
- Future schema changes to the test index would require updating this expected output
If this explicit listing is required by the new Calcite-based plan generation (as opposed to a more compact wildcard or implicit projection), this is acceptable. The field list is realistic and the indices ($0-$16) are correctly maintained through the projection chain.
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_timestamp_string.yaml (1)
1-8: Consider edge case coverage for timestamp-string comparison tests.This expected output file tests a basic timestamp range query with exclusive bounds. Per coding guidelines for test resources, verify that related test files cover important edge cases:
- Inclusive boundary conditions (
>=,<=)- Exact boundary values (timestamps matching filter bounds)
- Null timestamp values
- Malformed or invalid timestamp strings
- Timezone variations and DST transitions
- Timestamp precision edge cases (milliseconds, microseconds, nanoseconds)
While this file appropriately represents the expected output for its specific scenario, comprehensive edge case coverage is important for the new Calcite-based type coercion system introduced in this PR.
As per coding guidelines, ensure test data covers edge cases and boundary conditions.
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q2.yaml (2)
8-8: Verify performance implications of FILTER→SCRIPT pushdown change.The physical plan changed from
FILTER-><>($0, 0)toSCRIPT-><>(SAFE_CAST($0), 0), shifting from a native OpenSearch filter to script-based evaluation. OpenSearch scripts typically have higher overhead than native filters and may not benefit from index optimizations.Please confirm:
- Is this change intentional and necessary for the type coercion migration?
- Have performance benchmarks been run comparing FILTER vs SCRIPT approaches for simple predicates?
- Are there plans to optimize back to native filters for cases where SAFE_CAST is unnecessary?
Based on learnings, test SQL generation and optimization paths for Calcite integration changes.
1-8: Consider adding test coverage for SAFE_CAST edge cases.This test validates the happy path where SAFE_CAST succeeds, but doesn't cover edge cases such as:
- SAFE_CAST returning NULL when type conversion fails
- How NULL values are handled in the
<>comparison- Malformed or out-of-range values that trigger safe cast failures
Since this PR introduces a new type coercion framework with SAFE_CAST, consider adding complementary test cases that validate error handling and null semantics.
Would you like me to help identify other clickbench queries that could be extended to cover these edge cases, or open an issue to track comprehensive SAFE_CAST test coverage?
integ-test/src/test/resources/expectedOutput/calcite/clickbench/q37.yaml (1)
11-11: Physical plan expectation is comprehensive but potentially brittle.The physical plan includes detailed script serialization with SAFE_CAST operations that correctly mirror the logical plan. While this provides thorough validation of the Calcite-to-OpenSearch translation path, the embedded base64-encoded script and detailed request builder structure make this expectation sensitive to:
- Script serialization format changes
- Internal representation tweaks
- Field ordering or naming adjustments
Consider whether a more focused assertion on the key elements (SAFE_CAST presence, EXPR_TIMESTAMP usage, field mappings) might be more maintainable while still validating the critical behavior introduced by this PR.
Alternatively, if comprehensive end-to-end validation is the goal, ensure that test failure messages clearly indicate which part of the expectation mismatched (logical vs physical, script content vs request structure) to aid debugging.
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_function_script_push.yaml (1)
1-8: Consider adding test cases for SAFE_CAST edge cases.This test expectation validates the explain plan structure with
SAFE_CAST, but uses well-formed integer literals (32, 39225). Per the PR objectives stating "replace CAST with SAFE_CAST for malformed strings," consider whether additional test cases should cover:
- Malformed or invalid string inputs that SAFE_CAST should handle gracefully
- Null value handling through SAFE_CAST
- Type mismatches (e.g., comparing numeric fields to non-numeric literals)
- Boundary conditions (overflow, underflow, out-of-range values)
This would ensure the test validates actual safe casting behavior, not just plan structure.
integ-test/src/test/resources/expectedOutput/calcite/explain_filter_push_compare_time_string.yaml (1)
5-5: Test data is realistic and TIME() function behavior is correct; add simpler test cases for edge case coverage.The TIME function is a standard Calcite function with multiple overloads, and the test correctly validates its behavior. EXPR_TIME VARCHAR is a valid OpenSearch user-defined type (UDT) that maps TIME to VARCHAR, and the physical plan correctly shows time extraction: Sarg[('12:00:00.123456789':EXPR_TIME VARCHAR..'19:00:00.123456789':EXPR_TIME VARCHAR)].
The test data uses OpenSearch's "hh:mm:ss a" date format (12-hour with AM/PM) and tests nanosecond precision, which is realistic and covers a boundary condition. However, to fully satisfy the coding guideline of ensuring edge case and boundary condition coverage, add simpler test cases with time-only literals (e.g., TIME('12:00:00'), TIME('00:00:00'), TIME('23:59:59')) to validate basic time comparison functionality.
core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java (1)
32-38: Consider updating the comment to reflect the simplified validation.The comment block describes the parameter structure in detail, but the new validation simply checks that all parameters are MAP types. While the comment provides useful context about parameter semantics and notes that detailed validation happens in the "dedicated query builder," consider adding a note that the type-level validation here only enforces MAP types for all parameters, with detailed parameter structure validation delegated to the query builder.
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_5_push.yaml (1)
1-10: Test case correctly represents the expected plan structure.This expected output file properly covers:
- ✓ Nested LIMIT operations (10 and 5) with proper Calcite representation using
LogicalSort(fetch=...)- ✓ Projection chain testing (all fields → age only)
- ✓ Push-down optimization validation in the physical plan
- ✓ Correct YAML format matching the schema (calcite → logical/physical structure)
- ✓ Realistic field references for an OpenSearch account index
The test data appears realistic and the format is correct. The physical plan demonstrates proper optimization by pushing down both the projection (
PROJECT->[age]) and the effective limit (LIMIT->5).Optional: Consider whether the broader test suite covers additional edge cases like
LIMIT 0, or cases where the inner limit exceeds the outer limit, though this is beyond the scope of this single test file.integ-test/src/test/resources/expectedOutput/calcite/explain_bin_span.yaml (1)
4-6: Logical plan correctly reflects simplified projection structure.The removal of the top-level projection and adjustment of field indices ($8→$9 for email, $9→$10 for lastname) aligns with the PR's goal to simplify Calcite logical plans. The SPAN_BUCKET operation with a fixed width of 10 is correctly represented.
Consider adding complementary test cases for edge scenarios:
- NULL or missing age values
- Age values at bucket boundaries (e.g., exactly 0, 10, 20)
- Zero or negative span parameters (expected to fail validation)
- Empty result sets
These would provide more comprehensive coverage of the binning functionality under the new Calcite type-checking framework.
integ-test/src/test/resources/expectedOutput/calcite/explain_bin_minspan.yaml (1)
1-12: MINSPAN_BUCKET plan correctly implements dynamic binning with window aggregations.The explain plan accurately represents the MINSPAN_BUCKET operation, which requires window functions to compute MIN/MAX boundaries before bucketing. The physical plan correctly sequences operations: full scan for window aggregation → limit → bucket calculation → final limit.
The requestedTotalSize=2147483647 (Integer.MAX_VALUE) is appropriate for window functions that need access to all rows before applying the fetch=5 limit. Consider adding test coverage for edge cases:
- All age values identical (MIN($8) = MAX($8))
- NULL age values
- Single-row result sets
- min_span parameter larger than actual data range
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java (1)
24-24: Add class-level JavaDoc.The public class lacks class-level JavaDoc. Per coding guidelines, all public classes must have proper JavaDoc documentation explaining the class's purpose, when to use it, and how it fits into the broader Calcite integration.
Suggested JavaDoc structure
+/** + * Custom SqlToRelConverter for PPL that handles Spark SQL-specific join types. + * + * <p>This converter extends Calcite's standard {@link SqlToRelConverter} to support + * LEFT SEMI JOIN and LEFT ANTI JOIN semantics, which are not natively supported by + * Calcite's default join type conversion. The converter temporarily transforms these + * join types to LEFT JOIN for Calcite processing, then corrects the join type in the + * resulting RelNode tree. + * + * @see PplRelToSqlConverter for the corresponding Rel-to-SQL conversion + */ public class PplSqlToRelConverter extends SqlToRelConverter {integ-test/src/test/resources/expectedOutput/calcite/chart_null_str.yaml (1)
1-40: Expected Calcite plan output is correct; test covers primary null-handling scenario.The expected output file accurately represents the Calcite logical and physical execution plans for the chart query with null string handling. The test query
source=%s | chart limit=10 usenull=true nullstr='nil' avg(balance) over gender by age span=10does test null handling through:
- IS NULL checks in the CASE expression
- SAFE_CAST for type conversions
- SPAN with null parameter (
null:ANY)- Boundary coverage at age=30 (span=10 with data ages 28-36)
- Multiple null/missing values in test data (age, balance, gender fields)
However, test data could be expanded to cover additional edge cases: empty strings, non-numeric string values, and more span boundary conditions (exactly 0, negative values, larger boundaries). The current data is representative for the primary null-handling scenario but limited for comprehensive boundary testing.
core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java (1)
40-43: Add JavaDoc for public method.The new
getKind()method is public but lacks JavaDoc documentation. Per coding guidelines, all public methods should have proper JavaDoc with@returndocumentation.📝 Suggested JavaDoc addition
+ /** + * Returns the SQL kind for this function to enable Calcite dialect integration. + * + * @return SqlKind.MOD to identify this function as a modulo operation + */ @Override public SqlKind getKind() { return SqlKind.MOD; }core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java (1)
149-163: Constructor placement after methods is unconventional.The constructor on lines 149-163 appears after the
visitmethods (lines 127-147). Consider moving the constructor to appear after the field declarations (after line 96) for better readability, following standard Java class structure conventions.core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java (2)
21-21: Missing class-level JavaDoc.Per coding guidelines, all public classes should have proper JavaDoc. Add documentation explaining the shuttle's purpose (schema qualification stripping, COUNT normalization, tuple IN/NOT_IN ROW-wrapping).
Suggested JavaDoc
+/** + * A SqlShuttle that rewrites SQL nodes during validation to ensure compatibility + * with Calcite's type derivation and SQL-to-Rel conversion. + * + * <p>Rewrites performed: + * <ul> + * <li>Strips OpenSearch schema qualifier from two-part identifiers + * <li>Converts COUNT() to COUNT(*) for proper type derivation + * <li>Wraps SqlNodeList in ROW for tuple IN/NOT_IN predicates + * </ul> + */ public class SqlRewriteShuttle extends SqlShuttle {
62-65: Mutating input SqlCall may cause unintended side effects.The
setOperand(0, ...)call on line 64 mutates the originalSqlCallobject before passing it tosuper.visit(). If the sameSqlCallinstance is shared or visited multiple times, this mutation could cause unexpected behavior.Consider creating a new
SqlBasicCallwith the modified operand instead:🔎 Proposed fix
if (!call.getOperandList().isEmpty() && call.getOperandList().get(0) instanceof SqlNodeList nodes) { - call.setOperand(0, SqlStdOperatorTable.ROW.createCall(nodes)); + List<SqlNode> newOperands = new ArrayList<>(call.getOperandList()); + newOperands.set(0, SqlStdOperatorTable.ROW.createCall(nodes)); + call = call.getOperator().createCall( + call.getFunctionQuantifier(), + call.getParserPosition(), + newOperands.toArray(new SqlNode[0])); }core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java (2)
140-171: toSqlHint lacks null-safety for hint parameter.The
toSqlHintmethod checkshint.kvOptionsandhint.listOptionsfor null but doesn't guard against a nullhintparameter itself. If a null hint is passed (e.g., from a stream with null elements), this will throw NPE.🔎 Suggested fix
private static SqlHint toSqlHint(RelHint hint, SqlParserPos pos) { + if (hint == null) { + throw new IllegalArgumentException("RelHint cannot be null"); + } if (hint.kvOptions != null) {
125-129: Consider using Collectors.toList() for immutable result.The hint collection uses
Collectors.toCollection(ArrayList::new)butSqlNodeList.of()doesn't require mutability. UsingCollectors.toList()(ortoList()in Java 16+) would be more idiomatic.List<SqlNode> hints = e.getHints().stream() .map(relHint -> (SqlNode) toSqlHint(relHint, POS)) - .collect(Collectors.toCollection(ArrayList::new)); + .collect(Collectors.toList());core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ForallFunctionImpl.java (1)
54-68: Consider adding defensive checks ineval()method.The
eval()method performs unchecked casts that could fail at runtime:
- Line 57:
(List<Object>) args[0]- NPE if args[0] is null, ClassCastException if not a List- Line 60:
(Boolean) lambdaFunction.apply(candidate)- ClassCastException if lambda returns non-BooleanWhile operand type checking should prevent most misuse, adding defensive null checks would improve robustness.
🔎 Suggested defensive checks
public static Object eval(Object... args) { + if (args == null || args.length < 2 || args[0] == null || args[1] == null) { + return null; + } org.apache.calcite.linq4j.function.Function1 lambdaFunction = (org.apache.calcite.linq4j.function.Function1) args[1]; List<Object> target = (List<Object>) args[0]; try { for (Object candidate : target) { - if (!(Boolean) lambdaFunction.apply(candidate)) { + Object result = lambdaFunction.apply(candidate); + if (!(result instanceof Boolean) || !(Boolean) result) { return false; } } } catch (Exception e) { throw new RuntimeException(e); } return true; }core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonKeysFunctionImpl.java (1)
36-38: Add JavaDoc for public method.The operand metadata change correctly enables Calcite type validation with
OperandTypes.ANY, which is appropriate for a function accepting any JSON input. However, per coding guidelines, all public methods must have JavaDoc with@returndocumentation.📋 Suggested JavaDoc
+ /** + * Returns the operand metadata for the JSON_KEYS function. + * + * @return UDFOperandMetadata accepting any operand type + */ @Override public UDFOperandMetadata getOperandMetadata() { return UDFOperandMetadata.wrap(OperandTypes.ANY); }core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMaxFunction.java (1)
40-42: Add JavaDoc for public method.The operand metadata change correctly uses
OperandTypes.VARIADICfor the MAX function's variable-length argument list, properly enabling Calcite type validation. However, per coding guidelines, all public methods must have JavaDoc with@returndocumentation.📋 Suggested JavaDoc
+ /** + * Returns the operand metadata for the SCALAR_MAX function. + * + * @return UDFOperandMetadata accepting variadic operands + */ @Override public UDFOperandMetadata getOperandMetadata() { return UDFOperandMetadata.wrap(OperandTypes.VARIADIC); }core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.java (1)
38-41: Consider adding JavaDoc for public method.Per coding guidelines, public methods should have JavaDoc documentation. Since the PR objectives list "documentation/javadocs" as outstanding work, this can be addressed in that effort.
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java (1)
43-46: Consider adding JavaDoc for public method.While this method overrides a parent declaration, adding JavaDoc with an
@returndescription would improve code documentation and maintainability, explaining that the method returns variadic operand metadata to support 2+ argument validation.📝 Suggested JavaDoc addition
@Override + /** + * Returns operand metadata for json_extract function. + * + * @return Variadic operand metadata allowing 2 or more arguments + */ public UDFOperandMetadata getOperandMetadata() { return UDFOperandMetadata.wrap(OperandTypes.VARIADIC); }As per coding guidelines, public methods should have proper JavaDoc documentation.
core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java (1)
19-19: LGTM! Centralized type checking improves maintainability.Migrating from a local
dateRelatedType()method toOpenSearchTypeUtil.isDatetime()centralizes type checking logic and reduces code duplication. This aligns with the PR's goal of standardizing type handling through Calcite's RelDataType framework.Also applies to: 53-57
core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java (1)
23-23: LGTM! Explicit operand metadata improves validation.Replacing the null return with
UDFOperandMetadata.wrap(OperandTypes.ONE_OR_MORE)enables proper operand type checking and validation for the JSON_DELETE function. This aligns with similar changes across other JSON UDF implementations in the PR.Consider adding a unit test to verify the ONE_OR_MORE operand metadata behavior, especially for edge cases like:
- Single path argument (minimum case)
- Multiple path arguments
- Invalid operand counts (if validation is enforced)
As per coding guidelines, new business logic changes should have corresponding unit tests.
Also applies to: 40-40
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
1809-1809: Track TODO for chart pushdown regression.This TODO indicates that the chart with
limit/useotheroptions may have regressed from optimal pushdown behavior. Consider creating an issue to track this for future resolution.Would you like me to help create an issue to track this pushdown optimization?
integ-test/src/test/java/org/opensearch/sql/calcite/clickbench/PPLClickBenchIT.java (1)
68-71: Consider adding a comment explaining why q41 requires special handling.The code adds a conditional ignore for q41 with the comment "Ignore q41 as it needs special handling". Consider expanding this comment to briefly explain what makes q41 special (e.g., "requires alternative plan comparison" or "non-deterministic plan generation").
🔎 Suggested comment improvement
if (isCalciteEnabled()) { - // Ignore q41 as it needs special handling + // Ignore q41 as it requires separate test with alternative plan comparison ignored.add(41); }integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java (1)
458-465: Consider adding schema verification for consistency.The test validates data rows but doesn't verify the schema, unlike the adjacent
testBetweenWithIncompatibleTypeswhich callsverifySchema. For consistency with other BETWEEN tests in this file, consider adding schema verification.🔎 Suggested enhancement
@Test public void testBetweenWithMixedTypes() throws IOException { JSONObject actual = executeQuery( String.format( "source=%s | where age between '35' and 38 | fields firstname, age", TEST_INDEX_BANK)); + verifySchema(actual, schema("firstname", "string"), schema("age", "int")); verifyDataRows(actual, rows("Hattie", 36), rows("Elinor", 36)); }core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
137-140: Misleading error message: context already exists when this check runs.The error message states the provider must be set "before creating CalcitePlanContext," but this check occurs inside
getValidator()on an already-created instance. Consider rewording for accuracy.🔎 Suggested fix
if (operatorTableProvider == null) { throw new IllegalStateException( - "SqlOperatorTableProvider must be set before creating CalcitePlanContext"); + "SqlOperatorTableProvider must be set before calling getValidator()"); }core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java (1)
206-268: Consider adding JavaDoc for public inner class and methods.Per coding guidelines, public classes and methods should have proper JavaDoc. The
OpenSearchPrepareImplclass and its publicperformandgetPreparingStmtmethods would benefit from documentation describing their purpose, parameters, and return values, especially sincegetPreparingStmtis now part of the public API surface.core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)
63-74: Add JavaDoc for the publicgetCastSpecoverride.This method handles a Calcite-specific workaround for IP UDT types. Per coding guidelines, public methods should have JavaDoc with
@paramand@return. The inline comments are helpful but a method-level JavaDoc would improve discoverability.Suggested JavaDoc
+ /** + * Returns the cast specification for the given type. + * + * <p>For IP UDT types, returns a special {@link SqlDataTypeSpec} using {@link + * SqlAlienSystemTypeNameSpec} since IP borrows {@code SqlTypeName.OTHER} internally. + * + * @param type the target data type for the cast + * @return the cast specification SqlNode, or null if not applicable + */ @Override public @Nullable SqlNode getCastSpec(RelDataType type) {core/src/main/java/org/opensearch/sql/executor/QueryService.java (2)
313-328: Consider logging when tolerant validation is triggered.The tolerant validation path at lines 324-326 silently returns the original
relNodewhenValidationUtils.tolerantValidationException(e)returns true. Adding a debug/trace log would help diagnose validation bypass scenarios during debugging.Suggested enhancement
} catch (CalciteContextException e) { if (ValidationUtils.tolerantValidationException(e)) { + log.debug("Tolerant validation triggered, bypassing validation: {}", e.getMessage()); return relNode; } throw new ExpressionEvaluationException(e.getMessage(), e); }
349-352: Validate the round-trip conversion properly restores RelNode structure.The final conversion back through
PplRelToSqlRelShuttlewithfalseparameter (line 351) should reverse the interval literal transformations applied in line 311. The asymmetric boolean parameter usage (truethenfalse) could benefit from named constants or an enum for clarity.Consider using named constants for clarity
+ private static final boolean FIX_INTERVALS_FOR_SQL = true; + private static final boolean RESTORE_INTERVALS_FROM_SQL = false; + private RelNode validate(RelNode relNode, CalcitePlanContext context) { // ... - RelNode sqlRelNode = relNode.accept(new PplRelToSqlRelShuttle(context.rexBuilder, true)); + RelNode sqlRelNode = relNode.accept(new PplRelToSqlRelShuttle(context.rexBuilder, FIX_INTERVALS_FOR_SQL)); // ... - return validatedRel.accept(new PplRelToSqlRelShuttle(context.rexBuilder, false)); + return validatedRel.accept(new PplRelToSqlRelShuttle(context.rexBuilder, RESTORE_INTERVALS_FROM_SQL)); }core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)
56-66: Consider adding inline comments for IP type representation.The use of
SqlTypeName.OTHERto represent IP types may not be immediately obvious to future maintainers. A brief inline comment would improve clarity.🔎 Suggested improvement
private static final Map<SqlTypeName, ImmutableSet<@NonNull SqlTypeName>> additionalMapping = Map.of( + // SqlTypeName.OTHER represents IP and other user-defined types SqlTypeName.OTHER, ImmutableSet.of(SqlTypeName.OTHER, SqlTypeName.VARCHAR, SqlTypeName.CHAR), SqlTypeName.VARCHAR, ImmutableSet.<SqlTypeName>builder() .add(SqlTypeName.OTHER) .addAll(SqlTypeName.NUMERIC_TYPES) .build(), SqlTypeName.CHAR, ImmutableSet.of(SqlTypeName.OTHER));core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java (2)
26-26: Make INSTANCE field final for singleton safety.The singleton instance should be
finalto prevent accidental reassignment.🔎 Proposed fix
- public static PplConvertletTable INSTANCE = new PplConvertletTable(); + public static final PplConvertletTable INSTANCE = new PplConvertletTable();
25-27: Add class-level JavaDoc to document purpose.Per coding guidelines, public classes should have proper JavaDoc. Consider documenting the purpose of this convertlet table and when it's used in the PPL validation flow.
🔎 Proposed fix
+/** + * PPL-specific convertlet table that customizes SQL-to-Rex conversion for PPL operators. + * + * <p>This table provides custom conversion logic for: + * <ul> + * <li>IP-type comparisons: Routes to PPL IP comparison operators when operands are IP types</li> + * <li>ATAN function: Rewrites PPL ATAN to standard Calcite ATAN</li> + * </ul> + */ public class PplConvertletTable extends ReflectiveConvertletTable { + /** Singleton instance for use across the validation/conversion flow. */ public static PplConvertletTable INSTANCE = new PplConvertletTable();core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
332-386: Consider extracting type counting logic for readability.The method has 8 counter variables and multiple conditionals. Consider extracting the counting loop into a helper class or record (e.g.,
TypeCounts) to improve readability and testability.🔎 Example structure
private record TypeCounts(int nullCount, int anyCount, int nullableCount, int dateCount, int timeCount, int ipCount, int binaryCount, int otherCount) { static TypeCounts from(List<RelDataType> types) { // counting logic here } }core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java (1)
50-56: Document thread-safety implications of mutable 'top' field.The
topfield is mutable instance state used to track recursivederiveTypecalls. If the samePplValidatorinstance is used concurrently (which is unlikely but possible), this could cause incorrect type derivation. Consider adding a comment noting that validator instances should not be shared across threads.core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
41-55: Consider refining the JavaDoc for clarity.The method name is
isNumericOrCharacter, but the JavaDoc mentions "OpenSearch UDT numeric types" without mentioning that UDT character types are not included. The logic only checksExprCoreType.numberTypes()for UDTs, meaning UDT character types would return false - which may be intentional but should be documented.core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (3)
407-416: Consider stricter handling of duplicate function registrations.The method now logs a warning when a function is registered multiple times (lines 411-412) and silently overwrites the previous registration. While the warning provides visibility, this approach could mask configuration errors.
Consider:
- Throwing an exception for duplicate registrations to fail-fast on misconfiguration
- Using ERROR level instead of WARN if duplicates should never occur in production
- Documenting if duplicate registration is an expected pattern (e.g., hot-reload scenarios)
Optional: Stricter duplicate handling
if (externalFunctionRegistry.containsKey(functionName)) { - logger.warn( - String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); + throw new IllegalStateException( + String.format(Locale.ROOT, "Function %s is already registered", functionName)); }
425-438: Apply consistent duplicate handling for aggregate functions.Similar to
registerExternalOperator, this method warns on duplicates (lines 427-430) but continues with overwrite. Apply the same duplicate handling strategy consistently across both registration methods.
648-684: Document Calcite workarounds for composite type checkers.Several operator registrations include TODO comments about workarounds for "sequence CompositeOperandTypeChecker" (lines 650, 656, 666, 679). These manual
.or()chains replace Calcite's standard composite checkers.Ensure these workarounds are:
- Documented in code with explanation of why manual chaining is needed
- Tracked as technical debt if Calcite's standard behavior should eventually be used
- Tested to confirm the manual type checking produces equivalent behavior
As per coding guidelines: Document Calcite-specific workarounds in code.
| public SqlValidator getValidator() { | ||
| if (validator == null) { | ||
| final CalciteServerStatement statement; | ||
| try { | ||
| statement = connection.createStatement().unwrap(CalciteServerStatement.class); | ||
| } catch (SQLException e) { | ||
| throw new RuntimeException(e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential resource leak: Statement is never closed.
The CalciteServerStatement created on line 133 is never closed. While it's used to provide context for the validator, it should either be stored for later cleanup or handled with try-with-resources if only needed temporarily for initialization.
🔎 Suggested fix
If the statement must remain open for the validator's lifecycle, store it as a field and close it when the context is disposed. Alternatively, if only needed for initialization:
+ /** Statement backing the validator (kept open for validator lifecycle). */
+ private CalciteServerStatement validatorStatement;
+
public SqlValidator getValidator() {
if (validator == null) {
- final CalciteServerStatement statement;
try {
- statement = connection.createStatement().unwrap(CalciteServerStatement.class);
+ validatorStatement = connection.createStatement().unwrap(CalciteServerStatement.class);
} catch (SQLException e) {
throw new RuntimeException(e);
}Also consider implementing AutoCloseable on CalcitePlanContext to properly clean up resources.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129-136, the CalciteServerStatement created via
connection.createStatement().unwrap(...) is never closed which can leak
resources; either store this statement in a new private field and close it when
CalcitePlanContext is disposed (add a close() method or implement AutoCloseable
and close the field there), or if the statement is only needed for validator
initialization wrap its creation/use in a try-with-resources so it is closed
immediately after use; update class lifecycle to ensure the statement is closed
on context shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CalciteServerStatement does not implement AutoCloseable (although the statement created with connection.createStatement() does). Does this mean that I do not have to close it?
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
Show resolved
Hide resolved
| /** | ||
| * Override to support Spark SQL's LEFT ANTI JOIN and LEFT SEMI JOIN conversion to RelNode. | ||
| * | ||
| * <p>The default implementation in {@link SqlToRelConverter#convertJoinType} does not expect | ||
| * LEFT_ANTI_JOIN and LEFT_SEMI_JOIN. This override works around the limitation by first | ||
| * temporarily changing LEFT_ANTI_JOIN/LEFT_SEMI_JOIN to LEFT join in the SqlJoin node, then | ||
| * calling {@code super.convertFrom()} to perform normal conversion, finally substituting the join | ||
| * type in the resulting RelNode to ANTI/SEMI. | ||
| * | ||
| * @param bb Scope within which to resolve identifiers | ||
| * @param from FROM clause of a query. | ||
| * @param fieldNames Field aliases, usually come from AS clause, or null | ||
| */ | ||
| @Override | ||
| protected void convertFrom( | ||
| Blackboard bb, @Nullable SqlNode from, @Nullable List<String> fieldNames) { | ||
| JoinType originalJoinType = null; | ||
| if (from instanceof SqlJoin join) { | ||
| JoinType joinType = join.getJoinType(); | ||
| if (joinType == JoinType.LEFT_SEMI_JOIN || joinType == JoinType.LEFT_ANTI_JOIN) { | ||
| join.setOperand(2, JoinType.LEFT.symbol(from.getParserPosition())); | ||
| originalJoinType = joinType; | ||
| } | ||
| } | ||
| super.convertFrom(bb, from, fieldNames); | ||
| if (originalJoinType != null) { | ||
| RelNode root = bb.root(); | ||
| if (root != null) { | ||
| JoinRelType correctJoinType = | ||
| originalJoinType == JoinType.LEFT_SEMI_JOIN ? JoinRelType.SEMI : JoinRelType.ANTI; | ||
| RelNode fixedRoot = modifyJoinType(root, correctJoinType); | ||
| bb.setRoot(fixedRoot, false); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private RelNode modifyJoinType(RelNode root, JoinRelType correctJoinType) { | ||
| if (root instanceof LogicalProject project) { | ||
| RelNode input = project.getInput(); | ||
| RelNode fixedInput = modifyJoinType(input, correctJoinType); | ||
| if (fixedInput != input) { | ||
| return project.copy( | ||
| project.getTraitSet(), fixedInput, project.getProjects(), project.getRowType()); | ||
| } | ||
| } else if (root instanceof LogicalJoin join) { | ||
| if (join.getJoinType() == JoinRelType.LEFT) { | ||
| return join.copy( | ||
| join.getTraitSet(), | ||
| join.getCondition(), | ||
| join.getLeft(), | ||
| join.getRight(), | ||
| correctJoinType, | ||
| join.isSemiJoinDone()); | ||
| } | ||
| } | ||
| return root; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for test files related to PplSqlToRelConverter
echo "=== Searching for PplSqlToRelConverter test files ==="
find . -name "*Test*.java" -type f | xargs grep -l "PplSqlToRelConverter" 2>/dev/null
# Search for SEMI/ANTI join tests
echo -e "\n=== Searching for SEMI/ANTI join test references ==="
find . -name "*Test*.java" -type f | xargs grep -l "SEMI_JOIN\|ANTI_JOIN\|LEFT_SEMI\|LEFT_ANTI" 2>/dev/null | head -10
# Search for convertFrom method overrides in tests
echo -e "\n=== Searching for convertFrom tests ==="
find . -name "*Test*.java" -type f | xargs grep -l "convertFrom" 2>/dev/null
# List test files in the calcite validate directory
echo -e "\n=== Test files in calcite/validate directory ==="
find . -path "*/test/*" -path "*/calcite/validate*" -name "*Test*.java" 2>/dev/null | head -20Repository: opensearch-project/sql
Length of output: 359
🏁 Script executed:
#!/bin/bash
# Find the PplSqlToRelConverter file itself
echo "=== Finding PplSqlToRelConverter class file ==="
find . -name "PplSqlToRelConverter.java" -type f
# Search for any calcite-related tests
echo -e "\n=== Calcite-related test files ==="
find . -path "*/test/*" -path "*calcite*" -name "*.java" | head -20
# Check what's in SQLQueryValidatorTest related to SEMI/ANTI
echo -e "\n=== Checking SQLQueryValidatorTest for join tests ==="
find . -name "SQLQueryValidatorTest.java" -exec grep -A 5 -B 5 "SEMI_JOIN\|ANTI_JOIN" {} \;
# Search for SqlToRelConverter usage in tests
echo -e "\n=== SqlToRelConverter in tests ==="
find . -name "*Test*.java" -type f | xargs grep -l "SqlToRelConverter\|convertJoinType" 2>/dev/null
# Check the parent directory structure
echo -e "\n=== Directory structure for calcite/validate ==="
find . -path "*/calcite/validate/*" -type d | head -10Repository: opensearch-project/sql
Length of output: 4838
Add unit tests for PplSqlToRelConverter SEMI/ANTI join conversion.
Per coding guidelines for Calcite integration changes, comprehensive test coverage for SQL generation and optimization paths is required. This new converter lacks dedicated tests for its SEMI/ANTI join conversion logic. Add unit tests covering:
- LEFT_SEMI_JOIN to SEMI conversion
- LEFT_ANTI_JOIN to ANTI conversion
- Edge cases (nested projects, multiple join levels)
🤖 Prompt for AI Agents
core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java
lines 35-91: add unit tests that verify LEFT_SEMI_JOIN and LEFT_ANTI_JOIN are
converted to JoinRelType.SEMI and JoinRelType.ANTI respectively, including edge
cases (project-wrapped join and nested/multi-level joins). Create tests in the
corresponding test package (core/src/test/java/.../validate/converters) that
construct SqlJoin nodes with LEFT_SEMI_JOIN and LEFT_ANTI_JOIN, run through
PplSqlToRelConverter.convertFrom (or call the public entry point that uses it),
then assert on the resulting RelNode tree that the join nodes use SEMI/ANTI; for
project-wrapped cases ensure the converter drills through LogicalProject (assert
join type changed and project preserved), and for nested joins verify only the
target join(s) are modified. Include separate test methods with descriptive
names (e.g., convertLeftSemiJoinToSemi, convertLeftAntiJoinToAnti,
convertJoinThroughProject, convertNestedJoins) and cover both RelNode inspection
and round-trip SQL/rel equivalence where applicable.
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public SqlOperandCountRange getOperandCountRange() { | ||
| return getOperandMetadata().getOperandCountRange(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential NPE when getOperandMetadata() returns null.
The override delegates to getOperandMetadata().getOperandCountRange(), but getOperandMetadata() can return null (as seen in some UDF implementations before this PR's changes). If any UDF still returns null, this will cause a NullPointerException.
🔎 Proposed fix with null safety
@Override
public SqlOperandCountRange getOperandCountRange() {
- return getOperandMetadata().getOperandCountRange();
+ UDFOperandMetadata metadata = getOperandMetadata();
+ return metadata != null ? metadata.getOperandCountRange() : super.getOperandCountRange();
}🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java
around lines 74-78, the method currently calls
getOperandMetadata().getOperandCountRange() which can NPE if
getOperandMetadata() returns null; guard by checking if getOperandMetadata() ==
null and handle that case explicitly — either return a safe default
SqlOperandCountRange or throw an IllegalStateException with a clear message
(including the UDF class/name via getClass().getSimpleName()) so callers fail
fast and the null is surfaced safely.
| <!-- TODO: Enbale after fixing https://github.com/opensearch-project/sql/issues/4968 --> | ||
| ```ppl ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in TODO comment.
Line 217 contains "Enbale" which should be "Enable".
-<!-- TODO: Enbale after fixing https://github.com/opensearch-project/sql/issues/4968 -->
+<!-- TODO: Enable after fixing https://github.com/opensearch-project/sql/issues/4968 -->📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <!-- TODO: Enbale after fixing https://github.com/opensearch-project/sql/issues/4968 --> | |
| ```ppl ignore | |
| <!-- TODO: Enable after fixing https://github.com/opensearch-project/sql/issues/4968 --> |
🤖 Prompt for AI Agents
In docs/user/ppl/cmd/patterns.md around lines 217 to 218, fix the typo in the
TODO comment: change "Enbale" to "Enable" so the comment reads "Enable after
fixing https://github.com/opensearch-project/sql/issues/4968".
| LogicalAggregate(group=[{}], count(firstname)=[COUNT($0)], count(name)=[COUNT($1)]) | ||
| LogicalProject(firstname=[$1], name=[$10]) | ||
| LogicalProject(firstname=[$1], lastname=[$10]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read the actual test file to verify content
if [ -f "integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push10.yaml" ]; then
echo "=== File exists, reading lines 1-15 ==="
head -15 integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push10.yaml
else
echo "File not found, searching for similar files..."
find . -name "*explain_count_agg*.yaml" -type f 2>/dev/null | head -5
fiRepository: opensearch-project/sql
Length of output: 966
Update aggregation output field name to match projected field.
The LogicalAggregate at line 4 labels the output as count(name)=[COUNT($1)], but position $1 from the LogicalProject at line 5 is lastname, not name. This creates a schema mismatch where the output field name doesn't reflect the actual field being aggregated. Verify whether the aggregation label should be updated to count(lastname) to match the projection, or confirm this naming behavior is intended as part of the SqlNode conversion changes.
🤖 Prompt for AI Agents
In
integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push10.yaml
around lines 4-5, the LogicalAggregate output is labeled count(name) but the
LogicalProject’s $1 maps to lastname; update the aggregate output label to
count(lastname) (i.e., change count(name)=[COUNT($1)] to
count(lastname)=[COUNT($1)]) so the aggregation field name matches the projected
field, or if the naming is intentionally preserved by SqlNode conversion, add a
comment in the test explaining the expected behavior and leave the label as-is.
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) | ||
| physical: | | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#:LogicalAggregate.NONE.[](input=RelSubset#,group={},cnt=COUNT($0)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"cnt":{"value_count":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQBQ3sKICAib3AiOiB7CiAgICAibmFtZSI6ICIrIiwKICAgICJraW5kIjogIlBMVVMiLAogICAgInN5bnRheCI6ICJCSU5BUlkiCiAgfSwKICAib3BlcmFuZHMiOiBbCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiAwLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJCSUdJTlQiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp": 0,"SOURCES":[0,2],"DIGESTS":["balance",1]}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) | ||
| CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[AGGREGATION->rel#1105:LogicalAggregate.NONE.[](input=RelSubset#1104,group={},cnt=COUNT($0)), LIMIT->10000], OpenSearchRequestBuilder(sourceBuilder={"from":0,"size":0,"timeout":"1m","aggregations":{"cnt":{"value_count":{"script":{"source":"{\"langType\":\"calcite\",\"script\":\"rO0ABXQBQ3sKICAib3AiOiB7CiAgICAibmFtZSI6ICIrIiwKICAgICJraW5kIjogIlBMVVMiLAogICAgInN5bnRheCI6ICJCSU5BUlkiCiAgfSwKICAib3BlcmFuZHMiOiBbCiAgICB7CiAgICAgICJkeW5hbWljUGFyYW0iOiAwLAogICAgICAidHlwZSI6IHsKICAgICAgICAidHlwZSI6ICJCSUdJTlQiLAogICAgICAgICJudWxsYWJsZSI6IHRydWUKICAgICAgfQogICAgfSwKICAgIHsKICAgICAgImR5bmFtaWNQYXJhbSI6IDEsCiAgICAgICJ0eXBlIjogewogICAgICAgICJ0eXBlIjogIkJJR0lOVCIsCiAgICAgICAgIm51bGxhYmxlIjogdHJ1ZQogICAgICB9CiAgICB9CiAgXQp9\"}","lang":"opensearch_compounded_script","params":{"utcTimestamp":1765952821451990000,"SOURCES":[0,2],"DIGESTS":["balance",1]}}}}}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for utcTimestamp usage in the codebase
rg "utcTimestamp" -C 3 --type java --type yaml -iRepository: opensearch-project/sql
Length of output: 50379
Inconsistent utcTimestamp value across test fixtures.
The utcTimestamp parameter in the base64-encoded script is a fixed test value, not derived from system time. However, this file has utcTimestamp=1765952821451990000 while all other expected output YAML files use utcTimestamp=0. Align this value with other test fixtures for consistency.
🤖 Prompt for AI Agents
In
integ-test/src/test/resources/expectedOutput/calcite/explain_count_agg_push7.yaml
around line 8, the embedded script JSON contains
utcTimestamp=1765952821451990000 which is inconsistent with other fixtures;
update the utcTimestamp value to 0 inside the embedded script payload so it
matches the other expected output files, and if the script is encoded/escaped in
the file re-generate the escaped/encoded string accordingly so the file contains
the same JSON with "utcTimestamp":0.
| LogicalSort(fetch=[5]) | ||
| LogicalProject(age=[$8]) | ||
| LogicalSort(fetch=[10]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) | ||
| LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], age=[$8], email=[$9], lastname=[$10], _id=[$11], _index=[$12], _score=[$13], _maxscore=[$14], _sort=[$15], _routing=[$16]) | ||
| CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find test files that reference this expected output file
rg -l "explain_limit_10_5_push" --type=javaRepository: opensearch-project/sql
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Find the test case in ExplainIT.java that uses this expected output
rg -A 10 -B 5 "explain_limit_10_5_push" integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.javaRepository: opensearch-project/sql
Length of output: 662
🏁 Script executed:
#!/bin/bash
# Find the schema definition for the opensearch-sql_test_index_account table
# Look for test data or schema setup files
find integ-test -name "*.json" -o -name "*.yaml" | xargs grep -l "opensearch-sql_test_index_account" | head -5Repository: opensearch-project/sql
Length of output: 540
🏁 Script executed:
#!/bin/bash
# Look for account index data structure or test setup
fd "account" --type f integ-test/src/test/resources | head -20Repository: opensearch-project/sql
Length of output: 354
🏁 Script executed:
#!/bin/bash
# Read the account index mapping to verify field order
cat integ-test/src/test/resources/indexDefinitions/account_index_mapping.json | head -100Repository: opensearch-project/sql
Length of output: 1020
🏁 Script executed:
#!/bin/bash
# Read the accounts test data to see the actual field order
cat integ-test/src/test/resources/accounts.json | head -3Repository: opensearch-project/sql
Length of output: 314
🏁 Script executed:
#!/bin/bash
# Read the full YAML expected output to understand the complete plan structure
cat integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_5_push.yamlRepository: opensearch-project/sql
Length of output: 951
Correct the outer projection description.
The outer LogicalProject contains only age=[$8], not all 17 fields—the full-field projection is internal (below the inner LogicalSort(fetch=[10])). The plan correctly represents the query pipeline: scan → limit 10 → project age → limit 5. The field index age=[$8] is correct (age is the 9th field at 0-indexed position 8), and the physical plan correctly optimizes to fetch only 5 rows with the age field.
🤖 Prompt for AI Agents
In
integ-test/src/test/resources/expectedOutput/calcite/explain_limit_10_5_push.yaml
around lines 4-8, the outer LogicalProject is incorrectly showing the full
17-field projection; change it so the outer projection only lists the single
field age=[$8] (keeping the inner full-field LogicalProject under the inner
LogicalSort(fetch=[10]) intact) so the plan reads scan → LogicalSort(fetch=[10])
→ inner full projection → LogicalSort(fetch=[5]) → outer
LogicalProject(age=[$8]) to reflect limit 10 → project age → limit 5.
| // TODO: Refactor this match method | ||
| public boolean match(FunctionName functionName, List<RelDataType> argTypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does match() method is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's no longer used. I removed it.
| * internally. | ||
| */ | ||
| private final ImmutableMap<BuiltinFunctionName, List<Pair<CalciteFuncSignature, FunctionImp>>> | ||
| private final ImmutableMap<BuiltinFunctionName, Pair<CalciteFuncSignature, FunctionImp>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a FunctionRegistry required? It is needed because CalciteRelNodeVisitor must resolve functions during analysis and translation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still used when building logical plans from PPL's AST. yes, it's used in CalviteRelNodeVisitor. Now it's only name-based resolve
| import org.opensearch.sql.calcite.utils.OpenSearchTypeFactory; | ||
|
|
||
| @UtilityClass | ||
| public class ValidationUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add comments/javadoc if some function is copy from Calcite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I annotated every method I copied from Calcite's implementation
| * }</pre> | ||
| */ | ||
| @Override | ||
| protected Result visitAntiOrSemiJoin(Join e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add qualifier can also solve the issue? e.g. t0.name. Then we can avoid the ANTI JOIN visit in PplSqlToRelConverter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding a qualifier does solve the problem.
RCA
I investigated this query: source = opensearch-sql_test_index_state_country | where country = 'Canada' OR country = 'England' | left semi join left=a, right=b ON a.name = b.name opensearch-sql_test_index_occupation | sort a.age. It's original logical plan from CalciteRelNodeVisitor is:
LogicalSort(sort0=[$5], dir0=[ASC-nulls-first])
LogicalJoin(condition=[=($0, $6)], joinType=[semi])
LogicalProject(name=[$0], country=[$1], state=[$2], month=[$3], year=[$4], age=[$5])
LogicalFilter(condition=[SEARCH($1, Sarg['Canada':VARCHAR, 'England':VARCHAR]:VARCHAR)])
CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_state_country]])
LogicalProject(name=[$0], country=[$1], occupation=[$2], month=[$3], salary=[$4], year=[$5])
CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_occupation]])
Its SQL is however
SELECT `name`, `country`, `state`, `month`, `year`, `age`
FROM `OpenSearch`.`opensearch-sql_test_index_state_country` AS `t0`
WHERE `country` IN ('Canada', 'England') AND EXISTS (SELECT 1
FROM (SELECT `name`, `country`, `occupation`, `month`, `salary`, `year`
FROM `OpenSearch`.`opensearch-sql_test_index_occupation`) AS `t1`
WHERE `name` = `t1`.`name`) -- left `name` is unqualified
ORDER BY `age`Why is the first name unqualified
In RelToSqlConverter, the condition is initially created with qualifier WHERE t0.name = t1.name:
SqlNode sqlCondition =
convertConditionToSqlNode(e.getCondition(), leftContext, rightContext);The following lines removes the qualifier because it's attempting to create a correlated subquery, and in standard SQL, outer query references in correlated subqueries are typically unqualified.
if (leftResult.neededAlias != null) {
SqlShuttle visitor =
new AliasReplacementShuttle(leftResult.neededAlias,
e.getLeft().getRowType(), sqlSelect.getSelectList());
sqlCondition = sqlCondition.accept(visitor);
}The visitor replaces the identifiers whose first qualifier equals to leftResult.neededAlias:
@Override public SqlNode visit(SqlIdentifier id) {
...
if (tableAlias.equals(id.names.get(0))) {
int index =
requireNonNull(tableType.getField(id.names.get(1), false, false),
() -> "field " + id.names.get(1) + " is not found in "
+ tableType)
.getIndex();
SqlNode selectItem = source.get(index);
if (selectItem.getKind() == SqlKind.AS) {
selectItem = ((SqlCall) selectItem).operand(0);
}
return selectItem.clone(id.getParserPosition());
...
}This replacement shuttle is there because the alias of the outer query t0 does not exist under some conditions. However, SQL's name resolution rules resolve the unqualified name to the innermost scope first, and there happen to exist such name in the innermost scope, so the name resolution is falsely resolved to t1.name == t1.name in the end.
Resolution
It can be resolved by either removing the replacement shuttle or enabling ANTI / SEMI join syntax. But the former solution risks encountering an non-existing t0 alias if the following condition is not met:
if (leftResult.neededAlias != null && sqlSelect.getFrom() != null) {
sqlSelect.setFrom(as(sqlSelect.getFrom(), leftResult.neededAlias));
}(Note: the above condition is updated by CALCITE-3810: Render ANTI and SEMI join to NOT EXISTS and EXISTS in the JDBC adapter. The latest update may have resolved this problem, but the update is not included in Calcite 1.41)
I think adopting the second solution, enabling ANTI/SEMI join, is better:
-
It's safer because we will not run into trouble when the above condition is not met.
-
Spark SQL supports ANTI and SEMI JOIN
-
As we are creating ANTI / SEMI JOIN in CalciteRelNodeVisitor already, it will be more consistent if the logical plan translated back also contains them.
sql/core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
Lines 1288 to 1293 in d0195cb
if (node.getJoinType() == SEMI || node.getJoinType() == ANTI) { // semi and anti join only return left table outputs context.relBuilder.join( JoinAndLookupUtils.translateJoinType(node.getJoinType()), joinCondition); return context.relBuilder.peek(); }
| * }</pre> | ||
| */ | ||
| @Override | ||
| protected Result visitAntiOrSemiJoin(Join e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add qualifier can also solve the issue? e.g. t0.name. Then we can avoid the ANTI JOIN visit in PplSqlToRelConverter?
| @Override | ||
| public SqlKind getKind() { | ||
| return SqlKind.DIVIDE; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to enable operator-dependent type derivation.
| * internally. | ||
| */ | ||
| private final ImmutableMap<BuiltinFunctionName, List<Pair<CalciteFuncSignature, FunctionImp>>> | ||
| private final ImmutableMap<BuiltinFunctionName, Pair<CalciteFuncSignature, FunctionImp>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still used when building logical plans from PPL's AST. yes, it's used in CalviteRelNodeVisitor. Now it's only name-based resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
402-429: Document the intentional re-registration design or prevent duplicate registrations at initialization.The code logs warnings when functions are registered multiple times but continues with silent overwrites. Since
OpenSearchExecutionEngineis instantiated per-request and callsregisterOpenSearchFunctions()in its constructor, external functions (GEOIP, DISTINCT_COUNT_APPROX) are re-registered on every request, generating repeated warnings and making actual configuration errors difficult to detect.Either:
- Document that re-registration is intentional for plugin override scenarios
- Ensure functions are registered only once during startup (e.g., in a static initializer or plugin loader rather than per-request)
- Make duplicate registrations an error to fail fast on configuration mistakes
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
💤 Files with no reviewable changes (1)
- core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
core/src/main/java/org/opensearch/sql/executor/QueryService.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (1)
PPLOperandTypes(20-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (14)
core/src/main/java/org/opensearch/sql/executor/QueryService.java (2)
9-62: LGTM: Imports align with validation pipeline.The new imports for Calcite's validation infrastructure (SqlValidator, SqlNode, SqlToRelConverter, etc.) and PPL-specific validation components are appropriate for the validation pipeline being introduced.
124-125: LGTM: Clean validation integration.The validation step is consistently integrated into both execute and explain paths, positioned logically between filter merging and optimization.
Also applies to: 166-167
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (4)
1-58: LGTM: Well-structured class declaration.The class follows proper patterns with a factory method, extends the appropriate Calcite base class, and includes clear documentation.
81-90: LGTM: Proper UDT handling.The override correctly identifies cases where UDTs require casting from character types, addressing a gap in the base implementation.
92-141: LGTM: Datetime and IP comparison handling.The custom comparison type resolution correctly handles datetime cross-type comparisons and IP-to-string comparisons as described in the PR objectives. The nullability is properly preserved.
Note: The previous review comment about using
&&instead of&on line 120 has been addressed.
143-181: LGTM: Robust operand coercion.The method properly handles dynamic parameters, nullability synchronization, and uses SAFE_CAST to avoid malformed data errors as mentioned in the PR objectives.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (8)
256-256: LGTM: Import additions are appropriate.The new imports (
LocaleandOpenSearchTypeUtil) are properly used in the code.Localeis used for internationalized string formatting, andOpenSearchTypeUtilsupports the runtime type detection for ADD operations.Also applies to: 284-284
359-394: LGTM: Registry simplification aligns with Calcite validation migration.The migration from signature-based function lists to single-implementation-per-function simplifies the registry structure. Thread safety is maintained: immutable maps for internal registries and
ConcurrentHashMapfor external ones.
431-478: LGTM: Resolution logic correctly simplified.The resolution logic is now straightforward—direct name-based lookup and invocation. Type checking has been migrated to Calcite's validation phase (before resolution), eliminating the need for signature matching loops. The
compulsoryCastmethod remains as a documented ad-hoc workaround for the REDUCE function's specific casting requirements.
501-560: LGTM: Type checker migration to Calcite completed correctly.The migration from
PPLTypeCheckertoSqlOperandTypeCheckeris properly implemented. TheextractTypeCheckerFromUDFhelper correctly handles both user-defined and built-in operators. Note that some registrations passnullas the type checker (e.g., forJSON_ARRAY,JSON_OBJECT,TYPEOF), which is acceptable for functions with custom validation logic.
562-703: LGTM: Standard operator registration follows correct patterns.The registration of comparison and standard operators correctly uses Calcite's type checkers. The
SUBTRACToperator properly supports both numeric and datetime operands (for timestamp binning). The use of composite type checkers with.or()follows Calcite conventions.
811-867: LGTM: Function implementations handle edge cases correctly.The
TOSTRING,MVJOIN,SPLIT, andMVINDEXimplementations properly handle their respective requirements:
TOSTRINGsupports both 1-arg (cast) and 2-arg (format) formsSPLITcorrectly handles the empty delimiter edge case usingREGEXP_EXTRACT_ALLMVINDEXsupports both 2-arg and 3-arg variants with appropriate type checking
926-1228: LGTM: Remaining function registrations and builder classes are consistent.The rest of the
populate()method and builder classes follow consistent patterns:
- Type checkers properly use Calcite's
OperandTypesandSqlTypeFamilyIS_NOT_NULL/IS_NULLcorrectly useIGNOREfamily to support struct types (good fix)- Internal builder throws on duplicates (deterministic), while external registry warns (flexible for plugins)—this distinction is appropriate
- Aggregate function registrations consistently use
SqlOperandTypeChecker
914-925: Document the ADD operator's dual-behavior type checking pattern.The ADD operator uses
SqlStdOperatorTable.PLUS.getOperandTypeChecker()but implements runtime type detection to choose between CONCAT (for strings) and PLUS (for numbers). While the inline comment explains this is intentional to avoid confusion during function resolution, add a Javadoc comment or expanded code comment explaining:
- Why PLUS's type checker is used despite ADD accepting character types
- Whether Calcite's validation phase permits string operands to reach this runtime resolution, or if there's custom validator logic that handles this case
- Reference this as a Calcite-specific workaround pattern for dual-behavior operators
| @Override | ||
| public @Nullable RelDataType implicitCast(RelDataType in, SqlTypeFamily expected) { | ||
| RelDataType casted = super.implicitCast(in, expected); | ||
| if (casted == null) { | ||
| // String -> DATETIME is converted to String -> TIMESTAMP | ||
| if (OpenSearchTypeUtil.isCharacter(in) && expected == SqlTypeFamily.DATETIME) { | ||
| return createUDTWithAttributes(factory, in, OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP); | ||
| } | ||
| return null; | ||
| } | ||
| return switch (casted.getSqlTypeName()) { | ||
| case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY -> | ||
| createUDTWithAttributes(factory, in, casted.getSqlTypeName()); | ||
| default -> casted; | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace switch expression with Java 11 compatible syntax.
The switch expression on lines 70-74 uses Java 14+ syntax, but the coding guidelines require Java 11 compatibility for OpenSearch 2.x.
🔎 Proposed fix using traditional switch statement
- return switch (casted.getSqlTypeName()) {
- case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY ->
- createUDTWithAttributes(factory, in, casted.getSqlTypeName());
- default -> casted;
- };
+ switch (casted.getSqlTypeName()) {
+ case DATE:
+ case TIME:
+ case TIMESTAMP:
+ case BINARY:
+ return createUDTWithAttributes(factory, in, casted.getSqlTypeName());
+ default:
+ return casted;
+ }As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
around lines 60 to 75, replace the Java 14+ switch expression with a Java
11-compatible traditional switch statement: compute the result into a local
RelDataType variable (e.g., result), use a traditional
switch(casted.getSqlTypeName()) with case labels for DATE, TIME, TIMESTAMP,
BINARY that set result = createUDTWithAttributes(factory, in,
casted.getSqlTypeName()); and a default that sets result = casted; then return
result; ensuring all code paths assign result before the return.
| private static SqlNode castTo(SqlNode node, RelDataType type) { | ||
| if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { | ||
| ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); | ||
| return switch (exprType) { | ||
| case ExprCoreType.DATE -> | ||
| PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); | ||
| case ExprCoreType.TIMESTAMP -> | ||
| PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); | ||
| case ExprCoreType.TIME -> | ||
| PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); | ||
| case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); | ||
| default -> throw new UnsupportedOperationException("Unsupported type: " + exprType); | ||
| }; | ||
| } | ||
| // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted | ||
| return SqlLibraryOperators.SAFE_CAST.createCall( | ||
| node.getParserPosition(), | ||
| node, | ||
| SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace switch expression with Java 11 compatible syntax.
The switch expression on lines 186-195 uses Java 14+ syntax. This needs to be converted for Java 11 compatibility.
🔎 Proposed fix using traditional switch statement
private static SqlNode castTo(SqlNode node, RelDataType type) {
if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) {
ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type);
- return switch (exprType) {
- case ExprCoreType.DATE ->
- PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
- case ExprCoreType.TIMESTAMP ->
- PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
- case ExprCoreType.TIME ->
- PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
- case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
- default -> throw new UnsupportedOperationException("Unsupported type: " + exprType);
- };
+ if (exprType == ExprCoreType.DATE) {
+ return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node);
+ } else if (exprType == ExprCoreType.TIMESTAMP) {
+ return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node);
+ } else if (exprType == ExprCoreType.TIME) {
+ return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node);
+ } else if (exprType == ExprCoreType.IP) {
+ return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node);
+ } else {
+ throw new UnsupportedOperationException("Unsupported type: " + exprType);
+ }
}As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private static SqlNode castTo(SqlNode node, RelDataType type) { | |
| if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { | |
| ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); | |
| return switch (exprType) { | |
| case ExprCoreType.DATE -> | |
| PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); | |
| case ExprCoreType.TIMESTAMP -> | |
| PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); | |
| case ExprCoreType.TIME -> | |
| PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); | |
| case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); | |
| default -> throw new UnsupportedOperationException("Unsupported type: " + exprType); | |
| }; | |
| } | |
| // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted | |
| return SqlLibraryOperators.SAFE_CAST.createCall( | |
| node.getParserPosition(), | |
| node, | |
| SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); | |
| } | |
| private static SqlNode castTo(SqlNode node, RelDataType type) { | |
| if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { | |
| ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); | |
| if (exprType == ExprCoreType.DATE) { | |
| return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); | |
| } else if (exprType == ExprCoreType.TIMESTAMP) { | |
| return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); | |
| } else if (exprType == ExprCoreType.TIME) { | |
| return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); | |
| } else if (exprType == ExprCoreType.IP) { | |
| return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); | |
| } else { | |
| throw new UnsupportedOperationException("Unsupported type: " + exprType); | |
| } | |
| } | |
| // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted | |
| return SqlLibraryOperators.SAFE_CAST.createCall( | |
| node.getParserPosition(), | |
| node, | |
| SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); | |
| } |
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
around lines 183 to 202, the code uses a Java 14+ switch expression for mapping
ExprType to SqlNode; replace it with a Java 11-compatible traditional switch
statement: declare a SqlNode result variable, switch on exprType with case
labels that assign result = PPLBuiltinOperators.X.createCall(...); include a
default case that throws the same UnsupportedOperationException, then break each
case and after the switch return result; keep the existing parser position and
node arguments and preserve surrounding logic (including the SAFE_CAST return
for non-datetime/ip types).
| /** | ||
| * Validates a RelNode by converting it to SqlNode, performing validation, and converting back. | ||
| * | ||
| * <p>This process enables Calcite's type validation and implicit casting mechanisms to work on | ||
| * PPL queries. | ||
| * | ||
| * @param relNode the relation node to validate | ||
| * @param context the Calcite plan context containing the validator | ||
| * @return the validated (and potentially modified) relation node | ||
| */ | ||
| private RelNode validate(RelNode relNode, CalcitePlanContext context) { | ||
| SkipRelValidationShuttle skipShuttle = new SkipRelValidationShuttle(); | ||
| relNode.accept(skipShuttle); | ||
| if (skipShuttle.shouldSkipValidation()) { | ||
| return relNode; | ||
| } | ||
| // Fix interval literals before conversion to SQL | ||
| RelNode sqlRelNode = relNode.accept(new PplRelToSqlRelShuttle(context.rexBuilder, true)); | ||
|
|
||
| // Convert RelNode to SqlNode for validation | ||
| RelToSqlConverter rel2sql = new PplRelToSqlNodeConverter(OpenSearchSparkSqlDialect.DEFAULT); | ||
| SqlImplementor.Result result = rel2sql.visitRoot(sqlRelNode); | ||
| SqlNode root = result.asStatement(); | ||
|
|
||
| // Rewrite SqlNode to remove database qualifiers | ||
| SqlNode rewritten = root.accept(new SqlRewriteShuttle()); | ||
| SqlValidator validator = context.getValidator(); | ||
| try { | ||
| validator.validate(Objects.requireNonNull(rewritten)); | ||
| } catch (CalciteContextException e) { | ||
| if (ValidationUtils.tolerantValidationException(e)) { | ||
| return relNode; | ||
| } | ||
| throw new ExpressionEvaluationException(e.getMessage(), e); | ||
| } | ||
|
|
||
| SqlToRelConverter.Config sql2relConfig = | ||
| SqlToRelConverter.config() | ||
| // Do not remove sort in subqueries so that the orders for queries like `... | sort a | ||
| // | fields b` is preserved | ||
| .withRemoveSortInSubQuery(false) | ||
| // Disable automatic JSON_TYPE_OPERATOR wrapping for nested JSON functions. | ||
| // See CALCITE-4989: Calcite wraps nested JSON functions with JSON_TYPE by default | ||
| .withAddJsonTypeOperatorEnabled(false) | ||
| // Set hint strategy so that hints can be properly propagated. | ||
| // See SqlToRelConverter.java#convertSelectImpl | ||
| .withHintStrategyTable(PPLHintStrategyTable.getHintStrategyTable()); | ||
| SqlToRelConverter sql2rel = | ||
| new PplSqlToRelConverter( | ||
| context.config.getViewExpander(), | ||
| validator, | ||
| validator.getCatalogReader().unwrap(CalciteCatalogReader.class), | ||
| context.relBuilder.getCluster(), | ||
| PplConvertletTable.INSTANCE, | ||
| sql2relConfig); | ||
| // Convert the validated SqlNode back to RelNode | ||
| RelNode validatedRel = sql2rel.convertQuery(rewritten, false, true).project(); | ||
| return validatedRel.accept(new PplRelToSqlRelShuttle(context.rexBuilder, false)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find ValidationUtils.java file
find . -name "ValidationUtils.java" -type f 2>/dev/null
# Search for tolerantValidationException method usage
rg -n "tolerantValidationException" --type=java -B2 -A5Repository: opensearch-project/sql
Length of output: 1922
🏁 Script executed:
#!/bin/bash
# List all Java files in executor module that might contain ValidationUtils
fd "ValidationUtils" . --type f --extension java
# Also search for the method definition and all its usages
rg -n "tolerantValidationException\s*\(" --type=java -A3 -B2Repository: opensearch-project/sql
Length of output: 1782
🏁 Script executed:
#!/bin/bash
# Find all Validation-related classes
fd -i "validation" . --type f --extension java | head -20
# Search for CalciteContextException handling
rg "CalciteContextException" --type=java -n -B3 -A5
# Look for logging patterns in the validate method
rg -n "log\." core/src/main/java/org/opensearch/sql/executor/QueryService.java 2>/dev/null || echo "File check"Repository: opensearch-project/sql
Length of output: 9799
🏁 Script executed:
#!/bin/bash
# Find the QueryService.java file and examine the validate method context
rg -n "class QueryService" --type=java -A200 | grep -A50 "private RelNode validate"
# Find ValidationUtils and tolerantValidationException
rg -rn "class ValidationUtils" --type=java
# Check for any logging configuration related to validation
rg -n "logger\|log\.info\|log\.debug" core/src/main/java/org/opensearch/sql/executor/QueryService.javaRepository: opensearch-project/sql
Length of output: 152
🏁 Script executed:
#!/bin/bash
# Search for ValidationUtils implementation
find . -path "*/src/*" -name "*.java" -exec grep -l "ValidationUtils" {} \;
# Get the file location and content
rg -n "class ValidationUtils|static.*tolerantValidationException" --type=java --max-count=5Repository: opensearch-project/sql
Length of output: 1172
🏁 Script executed:
#!/bin/bash
# Find all files mentioning ValidationUtils
rg -l "ValidationUtils" --type=java | head -10
# For each file, search for the method
for file in $(rg -l "ValidationUtils" --type=java | head -5); do
echo "=== File: $file ==="
rg -A10 "tolerantValidationException" "$file"
doneRepository: opensearch-project/sql
Length of output: 1416
🏁 Script executed:
#!/bin/bash
# Find where QueryService.java is and look at the surrounding code
find . -name "QueryService.java" -type f 2>/dev/null | head -1 | xargs -I {} bash -c 'wc -l {} && echo "File found" && grep -n "tolerantValidationException" {}'
# Search for ValidationUtils definition
rg -n "class ValidationUtils" --type=java
rg -n "static.*boolean.*tolerantValidationException" --type=javaRepository: opensearch-project/sql
Length of output: 468
🏁 Script executed:
#!/bin/bash
# Find ValidationUtils class
rg -l "class ValidationUtils" --type=java
# Show the full implementation
rg -A 30 "static.*tolerantValidationException" --type=javaRepository: opensearch-project/sql
Length of output: 1202
🏁 Script executed:
#!/bin/bash
# Get the complete ValidationUtils.java file
cat core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.javaRepository: opensearch-project/sql
Length of output: 7184
🏁 Script executed:
#!/bin/bash
# Get more context around the tolerantValidationException method
head -170 core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java | tail -30Repository: opensearch-project/sql
Length of output: 1190
Add logging when validation is skipped for known Calcite compatibility issues.
The tolerant validation exception handling is well-documented and properly scoped to specific Calcite bugs (aggregate nesting and windowed aggregates in GROUP BY). However, when the fallback to the original RelNode occurs at line 325, there is no logging to track when queries bypass validation. Add a debug or warn log statement to improve observability and aid troubleshooting when these edge cases are encountered.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/executor/QueryService.java around lines
294 to 352, when validator.validate(...) throws a tolerant Calcite exception the
code currently returns the original relNode without any observability; add a log
entry (debug or warn based on severity policy) inside the catch branch where
ValidationUtils.tolerantValidationException(e) is true before returning relNode,
including the exception message and context (e.g., which validation rule was
tolerated and a short identifier of the relNode or query) so operators can see
when validation was skipped.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @yuancu. * #4892 (comment) The following files were modified: * `core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java` * `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java` * `core/src/main/java/org/opensearch/sql/calcite/CalciteRexNodeVisitor.java` * `core/src/main/java/org/opensearch/sql/calcite/ExtendedRexBuilder.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java` * `core/src/main/java/org/opensearch/sql/calcite/utils/binning/BinnableField.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplConvertletTable.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/SqlOperatorTableProvider.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/ValidationUtils.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplRelToSqlNodeConverter.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/converters/PplSqlToRelConverter.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/PplRelToSqlRelShuttle.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SkipRelValidationShuttle.java` * `core/src/main/java/org/opensearch/sql/calcite/validate/shuttles/SqlRewriteShuttle.java` * `core/src/main/java/org/opensearch/sql/executor/QueryService.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ExistsFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/FilterFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ForallFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVAppendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapAppendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MapRemoveFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ReduceFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java` * `core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java` * `core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java` * `core/src/main/java/org/opensearch/sql/expression/function/UserDefinedFunctionBuilder.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonAppendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonDeleteFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtendFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractAllFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonExtractFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonKeysFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/RelevanceQueryFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/SpanFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/binning/WidthBucketFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/DivideFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ModFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMaxFunction.java` * `core/src/main/java/org/opensearch/sql/expression/function/udf/math/ScalarMinFunction.java` * `core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/clickbench/PPLClickBenchIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendCommandIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAppendcolIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLExplainIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLPatternsIT.java` * `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java` * `integ-test/src/test/java/org/opensearch/sql/ppl/DateTimeFunctionIT.java` * `integ-test/src/test/java/org/opensearch/sql/ppl/ExplainIT.java`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (2)
60-75: Replace switch expression with Java 11 compatible syntax.The switch expression on lines 70-74 uses Java 14+ syntax, but the coding guidelines require Java 11 compatibility for OpenSearch 2.x. This issue was previously flagged but has not yet been addressed.
🔎 Proposed fix using traditional switch statement
@Override public @Nullable RelDataType implicitCast(RelDataType in, SqlTypeFamily expected) { RelDataType casted = super.implicitCast(in, expected); if (casted == null) { // String -> DATETIME is converted to String -> TIMESTAMP if (OpenSearchTypeUtil.isCharacter(in) && expected == SqlTypeFamily.DATETIME) { return createUDTWithAttributes(factory, in, OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP); } return null; } - return switch (casted.getSqlTypeName()) { - case SqlTypeName.DATE, SqlTypeName.TIME, SqlTypeName.TIMESTAMP, SqlTypeName.BINARY -> - createUDTWithAttributes(factory, in, casted.getSqlTypeName()); - default -> casted; - }; + switch (casted.getSqlTypeName()) { + case DATE: + case TIME: + case TIMESTAMP: + case BINARY: + return createUDTWithAttributes(factory, in, casted.getSqlTypeName()); + default: + return casted; + } }As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
183-202: Replace switch expression with Java 11 compatible syntax.The switch expression on lines 186-195 uses Java 14+ syntax. This needs to be converted for Java 11 compatibility. This issue was previously flagged but has not yet been addressed.
🔎 Proposed fix using if-else chain
private static SqlNode castTo(SqlNode node, RelDataType type) { if (OpenSearchTypeUtil.isDatetime(type) || OpenSearchTypeUtil.isIp(type)) { ExprType exprType = OpenSearchTypeFactory.convertRelDataTypeToExprType(type); - return switch (exprType) { - case ExprCoreType.DATE -> - PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); - case ExprCoreType.TIMESTAMP -> - PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); - case ExprCoreType.TIME -> - PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); - case ExprCoreType.IP -> PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); - default -> throw new UnsupportedOperationException("Unsupported type: " + exprType); - }; + if (exprType == ExprCoreType.DATE) { + return PPLBuiltinOperators.DATE.createCall(node.getParserPosition(), node); + } else if (exprType == ExprCoreType.TIMESTAMP) { + return PPLBuiltinOperators.TIMESTAMP.createCall(node.getParserPosition(), node); + } else if (exprType == ExprCoreType.TIME) { + return PPLBuiltinOperators.TIME.createCall(node.getParserPosition(), node); + } else if (exprType == ExprCoreType.IP) { + return PPLBuiltinOperators.IP.createCall(node.getParserPosition(), node); + } else { + throw new UnsupportedOperationException("Unsupported type: " + exprType); + } } // Use SAFE_CAST instead of CAST to avoid throwing errors when numbers are malformatted return SqlLibraryOperators.SAFE_CAST.createCall( node.getParserPosition(), node, SqlTypeUtil.convertTypeToSpec(type).withNullable(type.isNullable())); }As per coding guidelines, maintain Java 11 compatibility when possible for OpenSearch 2.x.
🧹 Nitpick comments (3)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (1)
56-58: Consider adding JavaDoc to the public constructor.While the factory method above has good documentation, coding guidelines require JavaDoc on all public methods. A brief description would improve API documentation completeness.
📝 Suggested JavaDoc
+ /** + * Constructs a PplTypeCoercion instance. + * + * @param typeFactory the type factory + * @param validator the SQL validator + */ public PplTypeCoercion(RelDataTypeFactory typeFactory, SqlValidator validator) { super(typeFactory, validator); }As per coding guidelines, all public classes and methods must have proper JavaDoc.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
402-408: Consider making duplicate registration warnings more actionable.The duplicate registration warning is helpful, but could guide developers on next steps.
🔎 Suggested enhancement to warning message
if (externalFunctionRegistry.containsKey(functionName)) { logger.warn( - String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); + String.format( + Locale.ROOT, + "Function %s is already registered and will be overwritten. Consider using a unique function name.", + functionName)); }
914-925: Improve comment clarity for the ADD operator runtime selection logic.The dynamic selection between CONCAT and PLUS based on operand types should be clarified. Since
RexNode.getType()is guaranteed to return non-null (per Calcite contract), the current implementation is safe without additional filtering. However, a comment explaining the runtime selection rationale would improve maintainability.Consider expanding the comment to explain that CONCAT is selected when all operands are character types, otherwise PLUS is used for numeric addition. This clarifies the intent for future maintainers.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/executor/QueryService.javacore/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.javacore/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
💤 Files with no reviewable changes (1)
- core/src/main/java/org/opensearch/sql/expression/function/CalciteFuncSignature.java
🚧 Files skipped from review as they are similar to previous changes (1)
- core/src/main/java/org/opensearch/sql/executor/QueryService.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.javacore/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.javacore/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Maintain Java 11 compatibility when possible for OpenSearch 2.x
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧬 Code graph analysis (3)
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
PPLFuncImpTable(291-1269)
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (2)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
OpenSearchTypeFactory(62-394)core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)
PplTypeCoercionRule(45-107)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
core/src/main/java/org/opensearch/sql/calcite/utils/UserDefinedFunctionUtils.java (1)
UserDefinedFunctionUtils(52-320)core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (1)
PPLOperandTypes(20-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
- GitHub Check: WhiteSource Security Check
🔇 Additional comments (19)
core/src/test/java/org/opensearch/sql/expression/function/AggFunctionTestBase.java (5)
19-28: LGTM! Simplified registry access.The change from
Pair<?, AggHandler>to directAggHandlerstorage simplifies the API and eliminates unnecessary wrapping. The reflection-based approach is appropriate for test utilities that need to verify internal registry state.
30-39: Consistent with simplified registry type.The method correctly uses the new
Map<BuiltinFunctionName, AggHandler>type and directly accesses the handler without Pair extraction. The assertions remain appropriate and meaningful.
47-54: Direct handler access is cleaner.Line 50 demonstrates the benefit of the simplified registry—direct assignment of
AggHandlerwithout needing to extract from aPair. This makes the code more straightforward and easier to understand.
56-70: Consistent type updates.Both methods correctly use the updated
Map<BuiltinFunctionName, AggHandler>type. Since these methods primarily operate on registry keys and size, the value type simplification has minimal impact while maintaining consistency across the test base.
72-98: Unaffected helper methods remain correct.These utility methods operate independently of the aggregate function registry and are unaffected by the type simplification. They continue to provide useful validation for function name resolution and enum mapping.
core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (5)
36-54: LGTM!The class structure and factory method are well-designed. The JavaDoc clearly explains the purpose and the blacklist approach for PPL-specific type coercion.
77-90: LGTM!The special handling for user-defined types (UDTs) is well-documented and necessary to prevent UDTs from being incorrectly treated as character types during cast validation.
92-111: LGTM!The datetime-string equality handling is well-implemented, correctly managing bidirectional comparisons and preserving UDT types when appropriate.
113-141: LGTM!The common type resolution for binary comparisons correctly implements PPL-specific rules:
- Date/time combinations coerce to timestamp
- IP/string comparisons coerce to IP
- Nullability is properly preserved
143-181: LGTM!The operand coercion logic is well-structured with proper handling of:
- Dynamic parameters (correctly skipped)
- Attribute synchronization
- SAFE_CAST nullability propagation
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (9)
359-394: Registry simplification aligns well with Calcite validation migration.The migration from
List<Pair<CalciteFuncSignature, FunctionImp>>to singleFunctionImpper function name is a significant architectural improvement that complements the new Calcite-based type checking and coercion. The simplified initialization logic usingputAllis cleaner and more maintainable.Based on learnings, this follows the pattern of leveraging Calcite's built-in mechanisms rather than maintaining custom parallel systems.
460-478: Simplified resolution logic is cleaner and more performant.The removal of signature list iteration and coercion logic at this layer makes sense given that type checking and coercion are now handled by Calcite's validation pipeline. The direct implementation lookup is more efficient.
504-560: Type checker migration to SqlOperandTypeChecker is well-structured.The updated signatures using
SqlOperandTypeCheckerand the helper methodextractTypeCheckerFromUDFprovide clean integration with Calcite's type system. The direct use ofOperandTypes.family()inregisterDivideFunctionis more maintainable than wrapped custom type checkers.
813-821: TOSTRING implementation correctly handles both signatures.The inlined implementation cleanly handles the 1-argument case (direct cast to VARCHAR) and 2-argument case (delegates to PPLBuiltinOperators.TOSTRING). The type checker using
.or()correctly validates both signatures.
1082-1110: Builder simplifications are consistent with the single-implementation model.The Builder and AggBuilder now enforce single registration per function name, which is appropriate for the new architecture. The exception thrown on duplicate internal registration (line 1091) correctly prevents configuration errors during initialization.
1137-1191: Aggregate function registrations correctly use type checkers.The aggregate registrations properly extract type checkers from UDFs using
extractTypeCheckerFromUDF()and use appropriateOperandTypesfor built-in aggregates. The MEDIAN implementation's reuse of PERCENTILE_APPROX with a fixed percentile value is elegant.
1246-1268: Excellent JavaDoc for extractTypeCheckerFromUDF.The comprehensive JavaDoc clearly explains the purpose of unwrapping
UDFOperandMetadatato extract the underlyingSqlOperandTypeCheckerand references the relevantPPLOperandTypesclass. This documentation is valuable given the method's increased usage across the refactored registration logic.
994-1009: The SUBSTRING and SUBSTR type checkers are correctly implemented and documented.Both functions properly register with custom type checkers using
OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.INTEGER).or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.INTEGER, SqlTypeFamily.INTEGER))to support both 2-arg and 3-arg variants. This aligns with Calcite's SUBSTRING(string FROM start [FOR length]) pattern and matches OpenSearch PPL's substring(str, start) or substring(str, start, length) syntax. The inline comment in the code already documents that manual type checker creation is necessary sinceSqlStdOperatorTable.SUBSTRING.getOperandTypeChecker()is null. Test coverage exists for both variants (e.g., AstBuilderTest.java validates 3-arg form, SemanticAnalyzerScalarFunctionTest.java covers multiple cases).
564-581: Verify OperandTypes type family handling with OpenSearch datetime and temporal types.The direct
OperandTypes.family(SqlTypeFamily.DATETIME, SqlTypeFamily.DATETIME)registration at lines 580-581 is correct. OpenSearch-specific UDT types (TIMESTAMP, TIME, DATE, BINARY) are properly mapped to SqlTypes through the PplValidator's type derivation layer (sqlTypeToUserDefinedType()in PplValidator.java andcreateUDTWithAttributes()in ValidationUtils.java), which handles the integration between Calcite's standard type families and OpenSearch's custom types during validation.
…king is deferred to the validation phase against the finished logical plan Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
397-424: Add null validation for external registration parameters.Both
registerExternalOperatorandregisterExternalAggOperatorshould validate that theoperatorandaggFunctionparameters are not null before registration. Currently, null operators would cause NPE during function resolution.🔎 Proposed null checks
public void registerExternalOperator(BuiltinFunctionName functionName, SqlOperator operator) { + if (operator == null) { + throw new IllegalArgumentException("Operator cannot be null"); + } if (externalFunctionRegistry.containsKey(functionName)) { logger.warn( String.format(Locale.ROOT, "Function %s is registered multiple times", functionName)); } externalFunctionRegistry.put(functionName, (builder, args) -> builder.makeCall(operator, args)); } public void registerExternalAggOperator( BuiltinFunctionName functionName, SqlUserDefinedAggFunction aggFunction) { + if (aggFunction == null) { + throw new IllegalArgumentException("AggFunction cannot be null"); + } if (aggExternalFunctionRegistry.containsKey(functionName)) { logger.warn( String.format( Locale.ROOT, "Aggregate function %s is registered multiple times", functionName)); }
🧹 Nitpick comments (4)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (2)
48-52: Consider extracting UDT type-checking pattern.Multiple methods repeat the pattern of checking
isUserDefinedType, casting toAbstractExprRelDataType<?>, and then accessinggetUdt()orgetExprType(). While not critical, consider extracting this pattern to reduce duplication.💡 Example refactor
Add a private helper method:
private static <T> Optional<T> getUdtAttribute(RelDataType type, Function<AbstractExprRelDataType<?>, T> extractor) { if (isUserDefinedType(type)) { return Optional.of(extractor.apply((AbstractExprRelDataType<?>) type)); } return Optional.empty(); }Then refactor methods like
isDate:public static boolean isDate(RelDataType type) { return getUdtAttribute(type, AbstractExprRelDataType::getUdt) .map(udt -> udt == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) .orElse(SqlTypeName.DATE.equals(type.getSqlTypeName())); }Note: This is a stylistic suggestion; the current implementation is also acceptable.
Also applies to: 72-78, 93-96, 111-115, 130-134, 156-158, 172-175
33-55: Clarify JavaDoc for mixed numeric/character checking.The method checks for both numeric and character types among built-in SQL types, but only numeric types among UDTs. The JavaDoc could make this asymmetry more explicit to avoid confusion.
🔎 Suggested clarification
/** - * Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types - * (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric - * types, and string types (VARCHAR, CHAR). + * Checks if the RelDataType represents a numeric or character type. + * + * <p>For built-in SQL types, this method checks both numeric types (INTEGER, BIGINT, SMALLINT, + * TINYINT, FLOAT, DOUBLE, DECIMAL, REAL) and character types (VARCHAR, CHAR). For OpenSearch + * UDTs, only numeric types are checked (character UDTs like VARCHAR-mapped types are excluded). * * @param fieldType the RelDataType to check - * @return true if the type is numeric or string, false otherwise + * @return true if the type is numeric (built-in or UDT) or character (built-in only), false + * otherwise */ public static boolean isNumericOrCharacter(RelDataType fieldType) {core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (2)
479-493: Add bounds checking in compulsoryCast.The
REDUCEcase accessesargs[1]andargs[2]without verifying that the array has at least 3 elements. While Calcite's validation should ensure correct arity, defensive bounds checking would prevent potentialArrayIndexOutOfBoundsExceptionand provide clearer error messages.🔎 Proposed bounds check
switch (functionName) { case BuiltinFunctionName.REDUCE: + if (args.length < 3) { + throw new IllegalArgumentException( + String.format("REDUCE requires at least 3 arguments, got %d", args.length)); + } // Set the second argument to the return type of the lambda function, so that // code generated with linq4j can correctly accumulate the result. RexLambda call = (RexLambda) args[2]; args[1] = builder.makeCast(call.getType(), args[1], true, true); break;
286-1114: Consider adding class-level JavaDoc and monitoring file size.The
PPLFuncImpTableclass serves as a central registry for PPL function implementations and plays a critical role in the Calcite integration. Consider adding comprehensive class-level JavaDoc that explains:
- The migration from multi-signature to single-implementation model
- How Calcite's validation handles type checking and coercion
- The relationship between internal and external registries
- When to use
registerExternalOperatorvs internal registrationThe file currently exceeds 1000 lines, which may warrant future refactoring to extract operator groups (e.g., date/time operators, string operators) into separate builder classes.
As per coding guidelines, classes over 500 lines may need organization review.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (21, unit)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (7)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
16-31: Well-designed utility class for UDT-aware type checking.The utility class provides a clean, centralized API for type checking with proper UDT support. The design follows established patterns (check built-in types first, then UDTs) and integrates well with Calcite's
SqlTypeUtil. This aligns with the PR's objectives to support UDTs during Calcite validation.core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (6)
455-473: LGTM! External function override logic is correct.The resolution logic correctly prioritizes external function implementations over internal ones, allowing data-storage-dependent implementations to override built-in functions. The null check ensures fail-fast behavior for unregistered functions.
752-760: LGTM! TOSTRING implementation handles both single and multi-arg cases correctly.The implementation appropriately uses Calcite's CAST for the simple single-argument case and delegates to a custom operator for more complex scenarios. The nullable VARCHAR type is correctly specified.
769-794: LGTM! SPLIT implementation elegantly handles empty delimiter edge case.The use of a CASE expression to conditionally choose between character splitting (via REGEXP_EXTRACT_ALL with '.') and standard delimiter splitting is a clean solution. The inline comments clearly explain the behavior.
531-536: LGTM! Migration to Calcite standard comparison operators.The direct registration of comparison operators using
SqlStdOperatorTablealigns with the PR objective of leveraging Calcite's built-in validation and type coercion instead of custom implementations.
968-982: LGTM! Simplified Builder enforces single-registration constraint.The refactored
Builderclass enforces that each function name maps to exactly one implementation, preventing ambiguity. The use ofIllegalStateExceptionwith a clear error message ensures fail-fast behavior during initialization.
984-1098: LGTM! AggBuilder follows consistent pattern with simplified registration.The
AggBuilderrefactor mirrors theBuilderchanges, maintaining consistency across the codebase. The aggregation handler registrations are clean and leverage lambda expressions effectively. The inline implementations (AVG, COUNT, PERCENTILE_APPROX, MEDIAN, EARLIEST, LATEST, FIRST, LAST) are well-structured.
| public class OpenSearchTypeUtil { | ||
| /** | ||
| * Whether a given RelDataType is a user-defined type (UDT) | ||
| * | ||
| * @param type the RelDataType to check | ||
| * @return true if the type is a user-defined type, false otherwise | ||
| */ | ||
| public static boolean isUserDefinedType(RelDataType type) { | ||
| return type instanceof AbstractExprRelDataType<?>; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types | ||
| * (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric | ||
| * types, and string types (VARCHAR, CHAR). | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is numeric or string, false otherwise | ||
| */ | ||
| public static boolean isNumericOrCharacter(RelDataType fieldType) { | ||
| // Check standard SQL numeric types & string types (VARCHAR, CHAR) | ||
| if (SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT numeric types | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return ExprCoreType.numberTypes().contains(udtType); | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a time-based field (timestamp, date, or time). Supports | ||
| * both standard SQL time types (including TIMESTAMP, TIMESTAMP_WITH_LOCAL_TIME_ZONE, DATE, TIME, | ||
| * and their timezone variants) and OpenSearch UDT time types. | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is time-based, false otherwise | ||
| */ | ||
| public static boolean isDatetime(RelDataType fieldType) { | ||
| // Check standard SQL time types | ||
| if (SqlTypeUtil.isDatetime(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT types (EXPR_TIMESTAMP mapped to VARCHAR) | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return udtType == ExprCoreType.TIMESTAMP | ||
| || udtType == ExprCoreType.DATE | ||
| || udtType == ExprCoreType.TIME; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a date type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#DATE} type and | ||
| * OpenSearch's user-defined date type {@link OpenSearchTypeFactory.ExprUDT#EXPR_DATE}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a date type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isDate(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.DATE.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a timestamp type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIMESTAMP} type and | ||
| * OpenSearch's user-defined timestamp type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIMESTAMP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a timestamp type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTimestamp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIMESTAMP.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a time type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIME} type and | ||
| * OpenSearch's user-defined time type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIME}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a time type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTime(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_TIME) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIME.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because | ||
| * user-defined types also have VARCHAR as their SqlTypeName. | ||
| */ | ||
| public static boolean isCharacter(RelDataType type) { | ||
| return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents an IP address type. | ||
| * | ||
| * <p>This method returns true only for OpenSearch's user-defined IP type {@link | ||
| * OpenSearchTypeFactory.ExprUDT#EXPR_IP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is an IP address type, false otherwise | ||
| */ | ||
| public static boolean isIp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_IP; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a binary type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in binary types (BINARY, VARBINARY) and | ||
| * OpenSearch's user-defined binary type {@link OpenSearchTypeFactory.ExprUDT#EXPR_BINARY}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a binary type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isBinary(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_BINARY; | ||
| } | ||
| return SqlTypeName.BINARY_TYPES.contains(type.getSqlTypeName()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for test files related to OpenSearchTypeUtil
find . -type f -name "*OpenSearchTypeUtil*Test*.java" 2>/dev/null | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Also search for references to OpenSearchTypeUtil in test files
rg -l "OpenSearchTypeUtil" --type java -g "*Test*.java" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check if the main file exists in the context provided
ls -la core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java 2>/dev/null || echo "File not found in repository"Repository: opensearch-project/sql
Length of output: 190
🏁 Script executed:
# Search for method references in any test files
rg "isUserDefinedType|isNumericOrCharacter|isDatetime|isDate|isTimestamp|isTime|isCharacter|isIp|isBinary" --type java -g "*Test.java" | head -30Repository: opensearch-project/sql
Length of output: 1197
🏁 Script executed:
# Also search without the Test suffix to catch integration tests (*IT.java)
rg "isUserDefinedType|isNumericOrCharacter|isDatetime|isDate|isTimestamp|isTime|isCharacter|isIp|isBinary" --type java -g "*IT.java" | head -30Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check for any test files in the calcite/utils directory
find core/src/test -type f -path "*calcite/utils*" -name "*.java" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 288
🏁 Script executed:
# Search for any direct callers of OpenSearchTypeUtil in the codebase
rg "OpenSearchTypeUtil\." --type java -g "!*Test.java" -g "!*IT.java" | wc -lRepository: opensearch-project/sql
Length of output: 68
🏁 Script executed:
# Search for imports of OpenSearchTypeUtil in test files (could be indirect testing)
rg "import.*OpenSearchTypeUtil" --type java | grep -i testRepository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Check for files that call OpenSearchTypeUtil methods in tests
rg "OpenSearchTypeUtil\." --type java | grep -i "test" | head -20Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
# Get a list of test files that might indirectly test these utilities
fd -e java -t f "Test.java$" "core/src/test" | xargs grep -l "isUserDefinedType\|isNumericOrCharacter\|isDatetime\|isDate\|isTimestamp\|isTime\|isCharacter\|isIp\|isBinary" 2>/dev/nullRepository: opensearch-project/sql
Length of output: 50
Create comprehensive unit tests for OpenSearchTypeUtil utility methods.
Unit tests are mandatory per coding guidelines for all new functions. This utility class contains 9 public static methods with no dedicated test coverage. Create OpenSearchTypeUtilTest.java in core/src/test/java/org/opensearch/sql/calcite/utils/ with test cases covering:
isUserDefinedType()– verify UDT detection and non-UDT handlingisNumericOrCharacter()– test both SQL numeric/character types and UDT numeric typesisDatetime()– verify SQL time types and UDT time types (TIMESTAMP, DATE, TIME)isDate(),isTimestamp(),isTime()– test built-in and UDT variants for each typeisCharacter()– verify non-UDT VARCHAR/CHAR detection onlyisIp()– test UDT IP type detectionisBinary()– test both SQL binary types and UDT binary type
Include edge cases: null inputs (once null handling is added), mixed type scenarios, and both code paths (built-in vs. UDT) for each method per Calcite integration test requirements.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
(around lines 22-178), there are nine public static helpers with no unit tests;
create
core/src/test/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtilTest.java
and implement focused JUnit tests that validate: isUserDefinedType() for both
UDT and non-UDT RelDataType instances; isNumericOrCharacter() for Calcite
numeric/character types and for AbstractExprRelDataType UDT numeric types;
isDatetime() for Calcite datetime types and UDT TIMESTAMP/DATE/TIME; isDate(),
isTimestamp(), isTime() each for built-in SqlTypeName and corresponding
OpenSearchTypeFactory.ExprUDT variants; isCharacter() ensuring it returns true
only for non-UDT VARCHAR/CHAR; isIp() for UDT EXPR_IP; isBinary() for
SqlTypeName binary types and UDT EXPR_BINARY; include mixed-type scenarios and
edge-case assertions (null inputs handling should be tested after adding
null-safety in the production methods — if null-safety is not present, assert
expected NullPointerException to reflect current behavior), and construct UDT
types by instantiating or mocking AbstractExprRelDataType and
OpenSearchTypeFactory.ExprUDT as needed so both code paths (built-in vs UDT) are
covered.
| public static boolean isUserDefinedType(RelDataType type) { | ||
| return type instanceof AbstractExprRelDataType<?>; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a numeric type. Supports standard SQL numeric types | ||
| * (INTEGER, BIGINT, SMALLINT, TINYINT, FLOAT, DOUBLE, DECIMAL, REAL), OpenSearch UDT numeric | ||
| * types, and string types (VARCHAR, CHAR). | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is numeric or string, false otherwise | ||
| */ | ||
| public static boolean isNumericOrCharacter(RelDataType fieldType) { | ||
| // Check standard SQL numeric types & string types (VARCHAR, CHAR) | ||
| if (SqlTypeUtil.isNumeric(fieldType) || SqlTypeUtil.isCharacter(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT numeric types | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return ExprCoreType.numberTypes().contains(udtType); | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks if the RelDataType represents a time-based field (timestamp, date, or time). Supports | ||
| * both standard SQL time types (including TIMESTAMP, TIMESTAMP_WITH_LOCAL_TIME_ZONE, DATE, TIME, | ||
| * and their timezone variants) and OpenSearch UDT time types. | ||
| * | ||
| * @param fieldType the RelDataType to check | ||
| * @return true if the type is time-based, false otherwise | ||
| */ | ||
| public static boolean isDatetime(RelDataType fieldType) { | ||
| // Check standard SQL time types | ||
| if (SqlTypeUtil.isDatetime(fieldType)) { | ||
| return true; | ||
| } | ||
|
|
||
| // Check for OpenSearch UDT types (EXPR_TIMESTAMP mapped to VARCHAR) | ||
| if (isUserDefinedType(fieldType)) { | ||
| AbstractExprRelDataType<?> exprType = (AbstractExprRelDataType<?>) fieldType; | ||
| ExprType udtType = exprType.getExprType(); | ||
| return udtType == ExprCoreType.TIMESTAMP | ||
| || udtType == ExprCoreType.DATE | ||
| || udtType == ExprCoreType.TIME; | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a date type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#DATE} type and | ||
| * OpenSearch's user-defined date type {@link OpenSearchTypeFactory.ExprUDT#EXPR_DATE}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a date type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isDate(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_DATE) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.DATE.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a timestamp type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIMESTAMP} type and | ||
| * OpenSearch's user-defined timestamp type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIMESTAMP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a timestamp type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTimestamp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_TIMESTAMP) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIMESTAMP.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a time type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in {@link SqlTypeName#TIME} type and | ||
| * OpenSearch's user-defined time type {@link OpenSearchTypeFactory.ExprUDT#EXPR_TIME}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a time type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isTime(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| if (((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_TIME) { | ||
| return true; | ||
| } | ||
| } | ||
| return SqlTypeName.TIME.equals(type.getSqlTypeName()); | ||
| } | ||
|
|
||
| /** | ||
| * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because | ||
| * user-defined types also have VARCHAR as their SqlTypeName. | ||
| */ | ||
| public static boolean isCharacter(RelDataType type) { | ||
| return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type); | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents an IP address type. | ||
| * | ||
| * <p>This method returns true only for OpenSearch's user-defined IP type {@link | ||
| * OpenSearchTypeFactory.ExprUDT#EXPR_IP}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is an IP address type, false otherwise | ||
| */ | ||
| public static boolean isIp(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() == OpenSearchTypeFactory.ExprUDT.EXPR_IP; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Checks whether a {@link RelDataType} represents a binary type. | ||
| * | ||
| * <p>This method returns true for both Calcite's built-in binary types (BINARY, VARBINARY) and | ||
| * OpenSearch's user-defined binary type {@link OpenSearchTypeFactory.ExprUDT#EXPR_BINARY}. | ||
| * | ||
| * @param type the type to check | ||
| * @return true if the type is a binary type (built-in or user-defined), false otherwise | ||
| */ | ||
| public static boolean isBinary(RelDataType type) { | ||
| if (isUserDefinedType(type)) { | ||
| return ((AbstractExprRelDataType<?>) type).getUdt() | ||
| == OpenSearchTypeFactory.ExprUDT.EXPR_BINARY; | ||
| } | ||
| return SqlTypeName.BINARY_TYPES.contains(type.getSqlTypeName()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null parameter validation to public methods.
All public methods accept RelDataType parameters but perform no null checks. Since this is a new public utility API, consider adding defensive null checks or documenting the non-null requirement with annotations.
🔎 Example fix for null validation
Option 1: Add explicit null checks with meaningful exceptions:
public static boolean isUserDefinedType(RelDataType type) {
+ if (type == null) {
+ throw new IllegalArgumentException("RelDataType parameter cannot be null");
+ }
return type instanceof AbstractExprRelDataType<?>;
}Option 2: Use Lombok's @NonNull annotation (requires adding Lombok dependency if not already present):
+import lombok.NonNull;
+
-public static boolean isUserDefinedType(RelDataType type) {
+public static boolean isUserDefinedType(@NonNull RelDataType type) {
return type instanceof AbstractExprRelDataType<?>;
}Apply similar changes to all other public methods in this class.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
around lines 29 to 177, all public methods that accept RelDataType
(isUserDefinedType, isNumericOrCharacter, isDatetime, isDate, isTimestamp,
isTime, isCharacter, isIp, isBinary) lack null validation; add defensive null
checks at the start of each public method (e.g., Objects.requireNonNull(type,
"type must not be null") or throw a clear NPE with a descriptive message) or
annotate parameters with a non-null annotation if your codebase prefers that
style, ensuring consistency across all methods and corresponding unit tests are
updated to expect the NPE/annotation contract.
| /** | ||
| * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because | ||
| * user-defined types also have VARCHAR as their SqlTypeName. | ||
| */ | ||
| public static boolean isCharacter(RelDataType type) { | ||
| return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve JavaDoc formatting and grammar.
The isCharacter method uses a comment-style documentation instead of the JavaDoc format used by other methods. Additionally, "in place for" should be "in place of".
🔎 Proposed fix
- /**
- * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because
- * user-defined types also have VARCHAR as their SqlTypeName.
- */
+ /**
+ * Checks whether a {@link RelDataType} represents a character type.
+ *
+ * <p>This method should be used in place of {@link SqlTypeUtil#isCharacter(RelDataType)} because
+ * user-defined types also have VARCHAR as their SqlTypeName but are not character types.
+ *
+ * @param type the type to check
+ * @return true if the type is a character type (excluding UDTs), false otherwise
+ */
public static boolean isCharacter(RelDataType type) {
return !isUserDefinedType(type) && SqlTypeUtil.isCharacter(type);
}🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
around lines 138 to 144, replace the comment-style documentation with proper
JavaDoc and fix the grammar: change "in place for" to "in place of", keep the
{@link SqlTypeUtil#isCharacter(RelDataType)} reference, and ensure the JavaDoc
sentence clearly explains that this method should be used instead of
SqlTypeUtil.isCharacter(...) because user-defined types also have VARCHAR as
their SqlTypeName; update punctuation and formatting to match surrounding method
docs.
| protected void registerOperator(BuiltinFunctionName functionName, SqlOperator operator) { | ||
| register( | ||
| functionName, | ||
| (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args), | ||
| typeChecker); | ||
| functionName, (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add null validation for operator parameter.
The registerOperator method should validate that the operator parameter is not null before creating the lambda. This prevents cryptic NPEs during function resolution and provides clearer error messages at registration time.
🔎 Proposed null check
protected void registerOperator(BuiltinFunctionName functionName, SqlOperator operator) {
+ if (operator == null) {
+ throw new IllegalArgumentException(
+ String.format("Operator cannot be null for function %s", functionName));
+ }
register(
functionName, (RexBuilder builder, RexNode... args) -> builder.makeCall(operator, args));
}🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
around lines 511 to 514, the registerOperator method currently accepts a
potentially null SqlOperator which can cause cryptic NPEs later; add an explicit
null check for the operator parameter and throw an IllegalArgumentException (or
Objects.requireNonNull with a clear message) before calling register so the
error is raised at registration time with a descriptive message.
| // Register ADD (+ symbol) for string concatenation and numeric addition | ||
| // Not creating PPL builtin operator as it will cause confusion during function resolution | ||
| FunctionImp add = | ||
| (builder, args) -> { | ||
| SqlOperator op = | ||
| (Stream.of(args).map(RexNode::getType).allMatch(OpenSearchTypeUtil::isCharacter)) | ||
| ? SqlStdOperatorTable.CONCAT | ||
| : SqlStdOperatorTable.PLUS; | ||
| return builder.makeCall(op, args); | ||
| }; | ||
| register(ADD, add); | ||
| register(ADDFUNCTION, add); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the PPLFuncImpTable file and examine resolve method calls
find . -name "PPLFuncImpTable.java" -type f 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 146
🏁 Script executed:
#!/bin/bash
# Search for resolve() method calls on FunctionImp
rg -n "\.resolve\(" --type=java -g "**/calcite/**" -A 3 -B 3 | head -100
# Search for similar type checking patterns with Stream
rg -n "Stream.*allMatch.*getType" --type=java | head -50
# Search for null type handling in expression implementations
rg -n "getType\(\).*=.*null\|null.*getType\(\)" --type=java -g "**/expression/**" | head -50Repository: opensearch-project/sql
Length of output: 11845
🏁 Script executed:
#!/bin/bash
# Get the PPLFuncImpTable.resolve method signature
sed -n '1,100p' ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
# Search for FunctionImp interface definition
rg -n "interface FunctionImp\|class.*FunctionImp" --type=java -A 10
# Check for Stream.allMatch with empty args handling
rg -n "allMatch.*getType\|getType.*allMatch" --type=java -B 2 -A 2 | head -50Repository: opensearch-project/sql
Length of output: 7943
🏁 Script executed:
#!/bin/bash
# Look at the PPLFuncImpTable file around lines 842-853 for ADD function
sed -n '800,900p' ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
# Also search for the resolve method signature
rg -n "public.*resolve\(" ./core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java -A 5
# Find how OpenSearchTypeUtil.isCharacter is implemented
rg -n "isCharacter" --type=java -B 2 -A 5Repository: opensearch-project/sql
Length of output: 19671
🏁 Script executed:
#!/bin/bash
# Check context where resolve is called - look for test patterns
rg -n "resolve.*ADD\|resolve.*ADDFUNCTION" --type=java -B 3 -A 3 | head -60Repository: opensearch-project/sql
Length of output: 48
Add defensive guard for empty args and document the dynamic operator selection pattern.
The type-based dispatch for ADD/ADDFUNCTION works correctly in practice (types are guaranteed post-Calcite validation), but needs defensive handling and documentation:
Stream.of(args).allMatch()returnstruefor empty args, incorrectly selecting CONCAT instead of PLUS- No documentation explaining the Calcite-specific dynamic dispatch pattern (required by coding guidelines)
Add an empty check and Calcite-specific documentation:
FunctionImp add =
(builder, args) -> {
// In Calcite, dynamic operator selection at plan-build time based on validated argument types.
// All RexNodes have types post-validation; allMatch returns true on empty stream, so guard it.
SqlOperator op = (args.length > 0
&& Stream.of(args).map(RexNode::getType).allMatch(OpenSearchTypeUtil::isCharacter))
? SqlStdOperatorTable.CONCAT
: SqlStdOperatorTable.PLUS;
return builder.makeCall(op, args);
};🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
around lines 842 to 853, the ADD/ADDFUNCTION implementation uses
Stream.of(args).allMatch(...) which returns true for empty args and thus wrongly
picks CONCAT; add a defensive guard to check args.length > 0 before evaluating
allMatch and update the inline comment to document the Calcite-specific dynamic
operator selection (types are available post-Calcite validation and allMatch on
empty streams yields true), so choose CONCAT only when args.length > 0 and all
argument types are character, otherwise select PLUS.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
131-141: Operand count validation doesn't respectthrowOnFailureparameter.While the fix now correctly passes
throwOnFailuretocheckSingleOperandTypeat line 140, the operand count validation at line 133-134 still doesn't respect thethrowOnFailureparameter. When the operand count is invalid andthrowOnFailureis true, the method should throw a validation exception rather than silently returning false.🔎 Proposed fix
@Override public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { if (!getOperandCountRange().isValidCount(callBinding.getOperandCount())) { + if (throwOnFailure) { + throw callBinding.newValidationSignatureError(); + } return false; } return IntStream.range(0, 2) .allMatch( i -> checkSingleOperandType( callBinding, callBinding.operand(i), i, throwOnFailure)); }core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)
48-81: Operand count validation doesn't enforce odd argument count requirement.The current implementation improves upon the previous version by enforcing a minimum of 3 arguments (line 72). However, it still doesn't validate that the total argument count is odd, which is required by the function's contract (json_string + path/value pairs).
With an even argument count (e.g., 4 arguments), the validation at lines 56-68 would pass (checking only arg[1] as a path), but the runtime
evalmethod would fail at line 104 when trying to accesskeys.get(i + 1)for a missing value argument.🔎 Proposed fix
@Override public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { final int count = callBinding.getOperandCount(); + // Validate odd argument count (json_string + path/value pairs) + if (count % 2 == 0) { + if (throwOnFailure) { + throw callBinding.newValidationSignatureError(); + } + return false; + } for (int i = 1; i < count; i += 2) { RelDataType nameType = callBinding.getOperandType(i); if (!OpenSearchTypeUtil.isCharacter(nameType)) { if (throwOnFailure) { throw callBinding.newError(RESOURCE.expectedCharacter()); } return false; } } return true; }
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
62-152: Consider documenting the arbitrary additional arguments behavior.The implementation allows arbitrary arguments after the first two (array and function), as noted in the comment at line 63. However, the runtime behavior in the
evalmethod (lines 195-208) currently only supports one captured variable. The limitation is documented in comments at lines 197-202, but users might not be aware of this constraint when calling the function.Consider adding JavaDoc to the class or method level explaining:
- The function accepts additional arguments as captured variables for the lambda
- Currently only the first captured variable is fully supported
- The behavior difference between Function1 (no captures/index) and Function2 (captures or index)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (1)
84-90: Consider instance-scoped dependency injection over static mutable state.The static
operatorTableProviderfield with a public setter creates global mutable state that could complicate testing (tests must coordinate access to shared state) and scenarios where different contexts need different operator tables. While this pattern breaks circular dependencies as noted in the PR objectives, consider whether you could pass the provider via thecreate()factory method or constructor instead. If the static approach is necessary, document the initialization contract and consider thread-safety ifsetOperatorTableProvidercould be called concurrently.Alternative instance-based approach
@Setter private static SqlOperatorTableProvider operatorTableProvider; + private final SqlOperatorTableProvider instanceOperatorTableProvider; private CalcitePlanContext(FrameworkConfig config, SysLimit sysLimit, QueryType queryType) { + this(config, sysLimit, queryType, operatorTableProvider); + } + + private CalcitePlanContext(FrameworkConfig config, SysLimit sysLimit, QueryType queryType, SqlOperatorTableProvider provider) { this.config = config; this.sysLimit = sysLimit; this.queryType = queryType; + this.instanceOperatorTableProvider = provider; // ... rest of initializationThen use
instanceOperatorTableProvideringetValidator()instead of the static field.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.javacore/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javacore/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Development requires JDK 21 for the OpenSearch SQL project
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java
🧬 Code graph analysis (2)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/TransformFunctionImpl.java (1)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/LambdaUtils.java (1)
LambdaUtils(11-29)
core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java (5)
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java (1)
CalciteToolsHelper(99-379)core/src/main/java/org/opensearch/sql/calcite/validate/OpenSearchSparkSqlDialect.java (1)
OpenSearchSparkSqlDialect(28-113)core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercion.java (1)
PplTypeCoercion(43-203)core/src/main/java/org/opensearch/sql/calcite/validate/PplTypeCoercionRule.java (1)
PplTypeCoercionRule(45-107)core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.java (1)
PplValidator(50-205)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java (1)
91-101: LGTM! Empty allowedParamTypes case is now handled correctly.The previous review concern about invalid operand count range when
allowedParamTypesis empty has been properly addressed. The code now checks for null/empty at line 92 and returns a valid range of (0,0) before computing min/max.core/src/main/java/org/opensearch/sql/expression/function/jsonUDF/JsonSetFunctionImpl.java (1)
56-68: Ensure comprehensive validation before runtime execution.The operand type validation correctly checks that path arguments are character types, which prevents type mismatches. However, this validation should be paired with the odd-count check (flagged separately) to ensure all runtime preconditions are validated during the validation phase rather than failing later in
eval().
| public SqlValidator getValidator() { | ||
| if (validator == null) { | ||
| synchronized (this) { | ||
| // Double-Checked Locking for thread-safety | ||
| if (validator == null) { | ||
| final CalciteServerStatement statement; | ||
| try { | ||
| statement = connection.createStatement().unwrap(CalciteServerStatement.class); | ||
| } catch (SQLException e) { | ||
| throw new RuntimeException(e); | ||
| } | ||
| if (operatorTableProvider == null) { | ||
| throw new IllegalStateException( | ||
| "SqlOperatorTableProvider must be set before creating CalcitePlanContext"); | ||
| } | ||
| SqlValidator.Config validatorConfig = | ||
| SqlValidator.Config.DEFAULT | ||
| .withTypeCoercionRules(PplTypeCoercionRule.instance()) | ||
| .withTypeCoercionFactory(PplTypeCoercion::create) | ||
| // Use lenient conformance for PPL compatibility | ||
| .withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance()) | ||
| // Use Spark SQL's NULL collation (NULLs sorted LOW/FIRST) | ||
| .withDefaultNullCollation(NullCollation.LOW) | ||
| // This ensures that coerced arguments are replaced with cast version in sql | ||
| // select list because coercion is performed during select list expansion during | ||
| // sql validation. Affects 4356.yml | ||
| // See SqlValidatorImpl#validateSelectList and AggConverter#translateAgg | ||
| .withIdentifierExpansion(true); | ||
| validator = | ||
| PplValidator.create( | ||
| statement, | ||
| config, | ||
| operatorTableProvider.getOperatorTable(), | ||
| TYPE_FACTORY, | ||
| validatorConfig); | ||
| } | ||
| } | ||
| } | ||
| return validator; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find CalciteServerStatement in the codebase
echo "=== Searching for CalciteServerStatement ==="
find . -type f -name "*.java" | xargs grep -l "CalciteServerStatement" | head -20
# Look for the file and class definition
echo -e "\n=== Checking if CalciteServerStatement is from Calcite library ==="
grep -r "class CalciteServerStatement\|interface CalciteServerStatement" --include="*.java" 2>/dev/null | head -5Repository: opensearch-project/sql
Length of output: 403
🏁 Script executed:
#!/bin/bash
# Find CalcitePlanContext to examine the actual implementation
echo "=== Finding CalcitePlanContext file ==="
find . -type f -name "CalcitePlanContext.java"
# Get the complete file content to understand the context
echo -e "\n=== CalcitePlanContext content ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.javaRepository: opensearch-project/sql
Length of output: 11109
🏁 Script executed:
#!/bin/bash
# Search for PplValidator.create to understand how statement is used
echo "=== Searching for PplValidator.create ==="
find . -type f -name "*.java" | xargs grep -A 20 "PplValidator.create\|class PplValidator" | head -100Repository: opensearch-project/sql
Length of output: 6457
🏁 Script executed:
#!/bin/bash
# Check how CalciteServerStatement is used elsewhere in the codebase
echo "=== CalciteServerStatement usage patterns ==="
find . -type f -name "*.java" -exec grep -H "CalciteServerStatement\|createStatement" {} \; | grep -v "\.class" | head -50Repository: opensearch-project/sql
Length of output: 3384
🏁 Script executed:
#!/bin/bash
# Get the complete CalciteToolsHelper file
echo "=== CalciteToolsHelper content ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.javaRepository: opensearch-project/sql
Length of output: 14929
🏁 Script executed:
#!/bin/bash
# Also check PplValidator to see if it stores statement reference
echo "=== PplValidator.java (full file) ==="
cat ./core/src/main/java/org/opensearch/sql/calcite/validate/PplValidator.javaRepository: opensearch-project/sql
Length of output: 8498
🏁 Script executed:
#!/bin/bash
# Check how CalcitePlanContext is created and used - who creates it?
echo "=== Searching for CalcitePlanContext.create() callers ==="
rg -B2 -A2 "CalcitePlanContext\.create\|new CalcitePlanContext" --type java | head -100Repository: opensearch-project/sql
Length of output: 123
🏁 Script executed:
#!/bin/bash
# Search for CalcitePlanContext usage in test and main files
echo "=== CalcitePlanContext usage in codebase ==="
rg "CalcitePlanContext" --type java -B 1 -A 1 | head -150Repository: opensearch-project/sql
Length of output: 15090
🏁 Script executed:
#!/bin/bash
# Look for unit tests of CalcitePlanContext to understand lifecycle
echo "=== CalcitePlanContext test files ==="
find . -type f -name "*CalcitePlanContext*Test*.java" -o -name "*Test*CalcitePlanContext*.java"Repository: opensearch-project/sql
Length of output: 103
🏁 Script executed:
#!/antml:function_calls
# Check if there is a close() method being called on CalcitePlanContext anywhere
echo "=== Searching for CalcitePlanContext close/cleanup ==="
rg "\.close\(\)|try.*CalcitePlanContext|AutoCloseable.*CalcitePlanContext" --type javaRepository: opensearch-project/sql
Length of output: 10796
🏁 Script executed:
#!/bin/bash
# Check how withPrepare is used - does it handle statement cleanup?
echo "=== CalciteToolsHelper.withPrepare usage and statement handling ==="
rg "withPrepare\|perform" core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java -B 2 -A 5
# Check if there's any statement closing elsewhere in related code
echo -e "\n=== Searching for statement management patterns ==="
rg "createStatement\(\)" --type java -B 2 -A 5 core/src/main/java/org/opensearch/sql/calcite/ | head -80Repository: opensearch-project/sql
Length of output: 1995
🏁 Script executed:
#!/bin/bash
# Verify CalciteServerStatement is from Calcite library and check its inheritance
echo "=== Check Calcite imports for CalciteServerStatement ==="
grep -r "import.*CalciteServerStatement" core/src/main/java/org/opensearch/sql/calcite/ --include="*.java"
# Look at how statements created elsewhere are managed (e.g., in tests that use try-with-resources)
echo -e "\n=== Statement resource patterns in tests ==="
rg "try.*createStatement\|createStatement.*try" --type java -B 1 -A 3 integ-test/ | head -50Repository: opensearch-project/sql
Length of output: 558
Address resource leak: Close the statement after validator creation.
The underlying Statement resource created by connection.createStatement() is not being closed. Although CalciteServerStatement itself doesn't implement AutoCloseable, it wraps a JDBC Statement that does hold resources requiring proper cleanup.
The statement is only needed to extract the prepare context via statement.createPrepareContext(). Since it's not stored or reused, close it immediately after validator initialization:
final CalciteServerStatement statement;
try {
statement = connection.createStatement().unwrap(CalciteServerStatement.class);
if (operatorTableProvider == null) {
throw new IllegalStateException(
"SqlOperatorTableProvider must be set before creating CalcitePlanContext");
}
SqlValidator.Config validatorConfig =
SqlValidator.Config.DEFAULT
.withTypeCoercionRules(PplTypeCoercionRule.instance())
.withTypeCoercionFactory(PplTypeCoercion::create)
.withConformance(OpenSearchSparkSqlDialect.DEFAULT.getConformance())
.withDefaultNullCollation(NullCollation.LOW)
.withIdentifierExpansion(true);
validator =
PplValidator.create(
statement,
config,
operatorTableProvider.getOperatorTable(),
TYPE_FACTORY,
validatorConfig);
} catch (SQLException e) {
throw new RuntimeException(e);
} finally {
if (statement != null) {
try {
statement.close();
} catch (SQLException e) {
// log or handle exception
}
}
}Also check CalciteToolsHelper.withPrepare() for the same pattern—it creates a statement at line 137 that should be closed after OpenSearchPrepareImpl().perform() completes.
The double-checked locking implementation with volatile is correct.
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/CalcitePlanContext.java around
lines 129-168, the Statement created via
connection.createStatement().unwrap(CalciteServerStatement.class) is never
closed causing a resource leak; after using the statement to create the
validator, ensure the underlying JDBC Statement is closed (e.g., wrap
creation/use in try/finally or try-with-resources so statement.close() is always
called and handle/log any SQLException from close) and keep the existing
exception handling for creation; also review CalciteToolsHelper.withPrepare()
(around line 137) where a statement is created and ensure that statement is
likewise closed after OpenSearchPrepareImpl().perform() completes.
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
138-144: JavaDoc grammar issue persists.The phrase "in place for" on line 139 should be "in place of" as noted in a previous review. Additionally, the JavaDoc format differs from other methods in this class.
🔎 Proposed fix
/** - * This method should be used in place for {@link SqlTypeUtil#isCharacter(RelDataType)} because - * user-defined types also have VARCHAR as their SqlTypeName. + * Checks whether a {@link RelDataType} represents a character type. + * + * <p>This method should be used in place of {@link SqlTypeUtil#isCharacter(RelDataType)} because + * user-defined types also have VARCHAR as their SqlTypeName but are not character types. + * + * @param type the type to check + * @return true if the type is a character type (excluding UDTs), false otherwise */ public static boolean isCharacter(RelDataType type) {
🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java (1)
57-79: Signature string doesn't reflect all accepted operand types.The
getAllowedSignaturesmethod returns"CIDRMATCH(<IP>, <STRING>)"but the type checker also accepts(STRING, STRING)viaOperandTypes.CHARACTER_CHARACTER.or(...). Consider updating the signature to reflect both accepted forms.🔎 Proposed fix
@Override public String getAllowedSignatures(SqlOperator op, String opName) { - return "CIDRMATCH(<IP>, <STRING>)"; + return "CIDRMATCH(<IP>, <STRING>), CIDRMATCH(<STRING>, <STRING>)"; }core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
492-509: Consider usingcreateCallinstead of mutating the SqlCall in place.The
rewriteCallimplementation mutates theSqlBasicCallviasetOperator(op)before returning it. While this pattern is consistent with similar operator rewriting inPplConvertletTable, creating a new call viacreateCallwould be more explicit and safer. The codebase usescreateCallfor similar transformation scenarios elsewhere, so adopting this approach would improve consistency.@Override public SqlNode rewriteCall(SqlValidator validator, SqlCall call) { SqlOperator op = call.getOperandList().size() == 2 ? SqlStdOperatorTable.ATAN2 : SqlStdOperatorTable.ATAN; - ((SqlBasicCall) call).setOperator(op); - return call; + return op.createCall(call.getParserPosition(), call.getOperandList()); } };
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java
🧰 Additional context used
📓 Path-based instructions (4)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
core/src/main/java/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
core/src/main/java/org/opensearch/sql/expression/**/*.java
⚙️ CodeRabbit configuration file
core/src/main/java/org/opensearch/sql/expression/**/*.java: - New expression implementations must follow existing patterns
- Type handling must be consistent with project type system
- Error handling must use appropriate exception types
- Null handling must be explicit and documented
Files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.javacore/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing Calcite integration patterns
- Verify RelNode visitor implementations are complete
- Check RexNode handling follows project conventions
- Validate SQL generation is correct and optimized
- Ensure Calcite version compatibility
- Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.javacore/src/main/java/org/opensearch/sql/expression/function/udf/ip/CidrMatchFunction.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.javacore/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : All new business logic requires unit tests
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Encourage comprehensive test coverage in code reviews
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Prefer `Optional<T>` for nullable returns in Java
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Validate all user inputs, especially queries
Applied to files:
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (1)
core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeFactory.java (1)
OpenSearchTypeFactory(62-394)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (8)
core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java (2)
242-293: Well-structured custom type checkers for scalar validation.The
SCALARandSCALAR_OPTIONAL_INTEGERimplementations correctly delegate toOpenSearchTypeUtil.isScalarand follow the established pattern for customSqlOperandTypeCheckerimplementations. The operand count validation before type checking is appropriate defensive programming.
101-156: Good documentation of Calcite schema differences.The inline comments explaining the difference between OpenSearch type system (INTERVAL as STRING) and Calcite SCOTT schema behavior are helpful for maintainability. This follows the coding guideline to document Calcite-specific workarounds.
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/IPFunction.java (1)
50-71: Clean implementation of IP function operand validation.The custom
SqlOperandTypeCheckercorrectly validates that the single operand is either an IP or character type, and properly uses thethrowOnFailureparameter to throw a validation signature error when needed. The signature string accurately reflects both accepted input types.core/src/main/java/org/opensearch/sql/expression/function/UDFOperandMetadata.java (1)
26-62: Clean API simplification.The refactored
wrapmethod provides a simpler interface that delegates all type checking to the wrappedSqlOperandTypeChecker. TheUnsupportedOperationExceptionforparamTypesandparamNamesis appropriate since these methods are not used in the current context, and the comments make this clear.core/src/main/java/org/opensearch/sql/calcite/utils/OpenSearchTypeUtil.java (2)
216-224: Good explicit null handling in isScalar.The
isScalarmethod explicitly handles null input and documents it in the JavaDoc (@param type the type to check; may be null). This is appropriate for this method's use case. Note that other methods in this class don't have null handling, which is acceptable if callers guarantee non-null inputs, but inconsistent.
159-178: Well-documented Calcite workaround for IP type.The JavaDoc clearly explains that
SqlTypeName.OTHERis "borrowed" to represent IP type during validation becauseSqlTypeName.IPdoesn't exist. This follows the coding guideline to document Calcite-specific workarounds. The overload pattern allows callers to opt-in to accepting OTHER when appropriate.core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (2)
530-541: Parameter type narrowed from SqlOperator to SqlFunction.The parameter type change from
SqlOperatortoSqlFunctionis a narrowing change. Since this is a private method, the impact is limited to this class. This aligns with the broader refactoring converting operators to functions throughout the file.
510-514: Clear explanation for explicit SPARK function declaration.The comment explains the design decision to explicitly declare SPARK dialect functions to avoid operator resolution issues with multiple overrides. This is helpful documentation for future maintainers.
| (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING)) | ||
| .or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent use of SqlTypeFamily.STRING vs SqlTypeFamily.CHARACTER.
Line 78 uses SqlTypeFamily.STRING while the rest of the file consistently uses SqlTypeFamily.CHARACTER for string types (e.g., lines 41, 65-66). SqlTypeFamily.STRING and SqlTypeFamily.CHARACTER are distinct in Calcite—CHARACTER is the standard family for CHAR/VARCHAR.
🔎 Proposed fix
public static final UDFOperandMetadata NUMERIC_STRING_OR_STRING_STRING =
UDFOperandMetadata.wrap(
- (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING))
- .or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING)));
+ (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.CHARACTER))
+ .or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER)));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.STRING)) | |
| .or(OperandTypes.family(SqlTypeFamily.STRING, SqlTypeFamily.STRING))); | |
| (OperandTypes.family(SqlTypeFamily.NUMERIC, SqlTypeFamily.CHARACTER)) | |
| .or(OperandTypes.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER))); |
🤖 Prompt for AI Agents
In core/src/main/java/org/opensearch/sql/calcite/utils/PPLOperandTypes.java
around lines 78 to 79, there's an inconsistent use of SqlTypeFamily.STRING;
replace SqlTypeFamily.STRING with SqlTypeFamily.CHARACTER so the operand family
call matches the rest of the file (use CHARACTER for CHAR/VARCHAR), ensuring all
string-type checks consistently use SqlTypeFamily.CHARACTER.
| return UDFOperandMetadata.wrap( | ||
| new SqlOperandTypeChecker() { | ||
| @Override | ||
| public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { | ||
| if (!getOperandCountRange().isValidCount(callBinding.getOperandCount())) { | ||
| return false; | ||
| } | ||
| return OpenSearchTypeUtil.isIp(callBinding.getOperandType(0), true) | ||
| && OpenSearchTypeUtil.isIp(callBinding.getOperandType(1), true); | ||
| } | ||
|
|
||
| @Override | ||
| public SqlOperandCountRange getOperandCountRange() { | ||
| return SqlOperandCountRanges.of(2); | ||
| } | ||
|
|
||
| @Override | ||
| public String getAllowedSignatures(SqlOperator op, String opName) { | ||
| return String.format(Locale.ROOT, "%s(<IP>, <IP>)", opName); | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
rg -n "CompareIpFunction|compare.*IP.*STRING|GREATER_IP|LESS_IP" --type java -g "*Test*.java" -A 3 | head -60Repository: opensearch-project/sql
Length of output: 48
🏁 Script executed:
fd -t f "*Test*.java" | xargs rg -l "Ip|IP" | head -20Repository: opensearch-project/sql
Length of output: 1964
🏁 Script executed:
cat -n core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.javaRepository: opensearch-project/sql
Length of output: 8471
🏁 Script executed:
find . -type f -name "OpenSearchTypeUtil.java" | head -5Repository: opensearch-project/sql
Length of output: 143
🏁 Script executed:
rg -n "isIp.*acceptOther|public.*isIp.*boolean" --type java -A 15 | head -80Repository: opensearch-project/sql
Length of output: 1941
Class-level documentation claims unsupported STRING operand signatures.
The class Javadoc (lines 42-47) documents signatures (IP, STRING) and (STRING, IP), but the type checker uses isIp(type, true) which only accepts:
- User-defined EXPR_IP type
- SqlTypeName.OTHER
It does not accept VARCHAR/STRING types. While the runtime implementation can convert String objects to ExprIpValue (lines 194-195), these conversions only occur after type validation, which rejects VARCHAR operands. The getAllowedSignatures method correctly reflects the actual type checker behavior.
Update the class Javadoc to remove the STRING signature claims, or modify the type checker to accept VARCHAR types if STRING support is intended.
🤖 Prompt for AI Agents
In
core/src/main/java/org/opensearch/sql/expression/function/udf/ip/CompareIpFunction.java
around lines 127 to 147, the runtime only accepts EXPR_IP/OTHER via
OpenSearchTypeUtil.isIp(type, true) and rejects VARCHAR/STRING, so update the
class Javadoc (earlier Javadoc lines ~42-47) to remove the documented (IP,
STRING) and (STRING, IP) signatures (or alternatively, if STRING support is
intended, change the type checker here to also accept
SqlTypeName.VARCHAR/CHAR/STRING and add tests); ensure the Javadoc signatures
match the actual type checker behavior and, if keeping STRING support, also
move/adjust validation so the runtime string-to-ExprIpValue conversion (around
lines 194-195) is reachable during type checking.
Description
Previously, we have implemented our own type checking mechanism due to a lacking of SQL validation phase, which relies on
SqlNode, and compatibility issues with user-defined types.With this PR, I try to solve these problems and migrate to Calcite's type checking mechanism by introducing a SqlNode layer. This is the Approach 1 described in #3865
It has the following advantages:
sqrt(x)by replacing it withpow(x, 0.5)in the SqlNode layer.Please feel free to ask why do you change this and why does that no longer match during review!
Work items:
SqlTypeName, therefore there is no space for UDT there. I did the trick by replacing the operand types of a function call before type checking and restore the types after type checking. E.g. EXPR_TIME -> SqlTypeName.TIME (1090/1599 ITs passed)SqlTypeName.TIMESTAMP). However, when the return type is related to the argument type, this will lead the return type to be sql type instead of UDTs, leaving to an incorrect plan and validation failures. We need to find a way that only use UDT at the top level, using SQL types in intermediate steps.Fixed by overriding
commonTypeForBinaryComparison(1240/1599 ITs passed)IPtype in calcite. For the purpose of type checking, I may have to use an existing sql type to fake it.ATANfunction has a one-operator and a two-operator version: define a new operator that rewrites sql call conditionallyADDcan be string concatenation and number addition - fixed by resolve call conditionally in FuncImpTableLOG(a, b)testRand)reducefunction's type inference is incomplete -- it will derive return types in bothSqlNodeandRelNodelevel, ensuring that they are the same. The current implementation works only at theRelNodeleveltestStatsWithBinsOnTimeAndTermField_AvgandtestStatsWithBinsOnTimeAndTermField_CounttestStatsWithBinsOnTimeField_CountandtestStatsWithBinsOnTimeField_Avgbucket_nullablecorrectlyrel node -> sql node -> rel noderound-trip conversion: fixed by addingSqlHintafter convertingLogicalAggregationRelated Issues
Resolves #4636, resolves #3865
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.