SNOW-2084165 Add dataframe operation lineage on SnowparkSQLException#3339
SNOW-2084165 Add dataframe operation lineage on SnowparkSQLException#3339sfc-gh-aalam merged 95 commits intomainfrom
Conversation
…lakedb/snowpark-python into aalam-SNOW-2084165-add-error-trace
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
…lakedb/snowpark-python into aalam-SNOW-2084165-add-error-trace
| """Returns the batch_ids of the children of this node.""" | ||
| return get_dependent_bind_ids(self.stmt_cache[self.batch_id]) | ||
|
|
||
| def get_src(self) -> Optional[proto.SrcPosition]: |
There was a problem hiding this comment.
In the hybrid client prototype we are using a slightly different method to get the source location; by just using inspect to walk the stack to the appropriate source location. We have to do this because modin is not using any of the AST stuff, but it's also relatively straight forward.
I sort of want to use your debugging tool for snowpandas as well; but we may want to refactor this so we don't require any of the protobuf work.
There was a problem hiding this comment.
What about using this function?
Essentially we seem to have three approaches to this problem. I'm /less/ of a fan of the AST because it doesn't help pandas for this type of debugging. but it seems like we might be able to consolidate w/ the open telemetry approach.
| _enable_dataframe_trace_on_error = False | ||
|
|
||
|
|
||
| def configure_development_features( |
There was a problem hiding this comment.
this is similar to what I would expect, see my comment on: #3380 (comment)
I'm thinking of the following to provide unified debug config experience.
@experimental(version="1.33.0")
def debug_config(
*
enable_eager_schema_validation=False,
enable_dataframe_trace_on_error=False,
)when users want to enable, they do
import snowflake.snowpark.context
snowflake.snowpark.context.debug_config(enable_eager_schema_validation=True)
# or
snowflake.snowpark.context.debug_config(
enable_eager_schema_validation=True,
enable_dataframe_trace_on_error=True
)There was a problem hiding this comment.
good idea. @sfc-gh-jrose and I are aligned on the name. Let me add the @experimental decorator as well.
There was a problem hiding this comment.
could not import decorator due to circular import issues but added a warning there
src/snowflake/snowpark/context.py
Outdated
|
|
||
| def configure_development_features( | ||
| *, | ||
| enable_dataframe_trace_on_error: bool = False, |
There was a problem hiding this comment.
I think we should default to True. That way users that want a basic development mode can call this function without any parameters.
…ion' into aalam-SNOW-2084165-add-error-trace
…ion' into aalam-SNOW-2084165-add-error-trace
…ion' into aalam-SNOW-2084165-add-error-trace
…lakedb/snowpark-python into aalam-SNOW-2084165-add-error-trace
| """A node representing a dataframe operation in the DAG that represents the lineage of a DataFrame.""" | ||
|
|
||
| def __init__(self, batch_id: int, stmt_cache: Dict[int, proto.Stmt]) -> None: | ||
| self.batch_id = batch_id |
There was a problem hiding this comment.
Nit: I would argue that this isn't meant to be batch ID anymore. Within each Python session that imports the Snowpark module, each AST ID for a Table or Dataframe will be a UID.
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-2084165
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
see doc.