fix: Enable explode() and flatten() table functions in local testing …#4085
Open
JoshElkind wants to merge 1 commit intosnowflakedb:mainfrom
Open
fix: Enable explode() and flatten() table functions in local testing …#4085JoshElkind wants to merge 1 commit intosnowflakedb:mainfrom
JoshElkind wants to merge 1 commit intosnowflakedb:mainfrom
Conversation
…mode This addresses GitHub issue snowflakedb#3565 (SNOW-2213161) where calling explode() on a DataFrame in local testing mode failed with: AttributeError: 'MockSelectStatement' object has no attribute 'snowflake_plan' Root Cause: ----------- 1. MockSelectStatement lacked the `snowflake_plan` property that table_function.py expects for schema inference. 2. The mock execution engine had no implementation for the built-in FLATTEN table function (used internally by explode). Changes: -------- 1. Added `snowflake_plan` property to MockSelectable (_select_statement.py) as an alias for `execution_plan`. Includes detailed comment explaining why this alias is safe and what could cause future divergence. 2. Implemented `handle_flatten_function()` in _plan.py with: - Clear documentation of supported vs unsupported parameters - NotImplementedError for PATH and RECURSIVE parameters (not supported) - Support for: input, outer, mode parameters - Array flattening (VALUE column) - Object/dict flattening (KEY + VALUE columns) 3. Added comprehensive tests (13 total) covering: - Basic explode with arrays, integers, maps - Schema verification - explode_outer with empty/null arrays - Column preservation - Direct flatten() usage - NotImplementedError for unsupported parameters - Verification that non-flatten UDTFs still work Design Decisions: ----------------- - This is a MINIMAL implementation for explode() compatibility, NOT a complete FLATTEN implementation. Unsupported features raise clear errors. - Early-return pattern ensures existing UDTF code path is unchanged. - SEQ, PATH, INDEX, THIS output columns are not implemented (use integration tests if your tests require these). Testing: -------- - 468 mock tests pass (13 new) - 1759 unit tests pass - Performance: ~65ms for 1000 rows (acceptable for local testing)
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
Author
|
recheck |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: Enable explode() and flatten() table functions in local testing mode
This addresses GitHub issue #3565 (SNOW-2213161) where calling explode() on a DataFrame in local testing mode failed with:
AttributeError: 'MockSelectStatement' object has no attribute 'snowflake_plan'
Root Cause:
MockSelectStatement lacked the snowflake_plan property that table_function.py expects for schema inference.
The mock execution engine had no implementation for the built-in FLATTEN table function (used internally by explode).
Changes:
Added snowflake_plan property to MockSelectable (select_statement.py) as an alias for execution_plan. Includes detailed comment explaining why this alias is safe and what could cause future divergence.
Implemented handle_flatten_function() in plan.py with:
Clear documentation of supported vs unsupported parameters
NotImplementedError for PATH and RECURSIVE parameters (not supported)
Support for: input, outer, mode parameters
Array flattening (VALUE column)
Object/dict flattening (KEY + VALUE columns)
Added comprehensive tests (13 total) covering:
Basic explode with arrays, integers, maps
Schema verification
explode_outer with empty/null arrays
Column preservation
Direct flatten() usage
NotImplementedError for unsupported parameters
Verification that non-flatten UDTFs still work
Design Decisions:
This is a MINIMAL implementation for explode() compatibility, NOT a complete FLATTEN implementation. Unsupported features raise clear errors.
Early-return pattern ensures existing UDTF code path is unchanged.
SEQ, PATH, INDEX, THIS output columns are not implemented (use integration tests if your tests require these).
Testing:
468 mock tests pass (13 new)
1759 unit tests pass
Performance: ~65ms for 1000 rows (acceptable for local testing)
Which Jira issue is this PR addressing?
Fixes SNOW-2213161
Fill out the following pre-review checklist:
[x] I am adding a new automated test(s) to verify correctness of my new code
[ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
[ ] I am adding new logging messages
[ ] I am adding a new telemetry message
[ ] I am adding new credentials
[ ] I am adding a new dependency
[x] If this is a new feature/behavior, I'm adding the Local Testing parity changes.
[x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
[ ] If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
Please describe how your code solves the related issue.
The fix addresses two gaps in the local testing mock framework:
Problem 1: Missing API compatibility
When explode() is called, table_function.py accesses select_statement.snowflake_plan.output to infer column types. MockSelectStatement didn't have this property, causing the AttributeError.
Solution: Added snowflake_plan as a property alias to execution_plan in MockSelectable. This is safe because MockExecutionPlan.output provides the same List[Attribute] interface that SnowflakePlan.output does.
Problem 2: No FLATTEN implementation
explode() internally uses Snowflake's built-in FLATTEN table function. The mock framework only handled user-defined table functions (UDTFs), not built-ins.
Solution: Added handle_flatten_function() that executes FLATTEN logic locally:
For arrays: expands each element into a row with VALUE column
For objects/dicts: expands each key-value pair into rows with KEY and VALUE columns
Supports outer=True for null/empty handling (used by explode_outer)
Raises NotImplementedError for unsupported parameters (path, recursive) to prevent silent behavior drift
Isolation: The fix uses an early-return pattern in handle_udtf_expression() so existing UDTF handling is completely unchanged. A dedicated test verifies custom UDTFs still work after this change.