-
Notifications
You must be signed in to change notification settings - Fork 143
SNOW-2680714 : Fix time_series_agg() correctness and make it GA #4049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3d4218c
5de9f44
525d17c
773b849
81421d9
f7d34f9
b70edb4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,7 +10,7 @@ | |
| build_expr_from_snowpark_column_or_col_name, | ||
| with_src_position, | ||
| ) | ||
| from snowflake.snowpark._internal.utils import experimental, publicapi, warning | ||
| from snowflake.snowpark._internal.utils import publicapi, warning | ||
| from snowflake.snowpark.column import Column, _to_col_if_str | ||
| from snowflake.snowpark.functions import ( | ||
| _call_function, | ||
|
|
@@ -629,7 +629,6 @@ def compute_lead( | |
|
|
||
| return df | ||
|
|
||
| @experimental(version="1.12.0") | ||
| @publicapi | ||
| def time_series_agg( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious how changing time_series_agg fixes the issue SNOW-2680714. Are we calling this function with the sliding_window_agg or is the customer passing this in?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made recent updates, Please take a look. We will exclude the rows with same timestamp with second granularity. |
||
| self, | ||
|
|
@@ -645,6 +644,9 @@ def time_series_agg( | |
| Applies aggregations to the specified columns of the DataFrame over specified time windows, | ||
| and grouping criteria. | ||
|
|
||
| .. note:: | ||
| Aggregations exclude rows within 1 second of the current timestamp to prevent data leakage. | ||
|
|
||
| Args: | ||
| aggs: A dictionary where keys are column names and values are lists of the desired aggregation functions. | ||
| windows: Time windows for aggregations using strings such as '7D' for 7 days, where the units are | ||
|
|
@@ -776,10 +778,13 @@ def time_series_agg( | |
| years=interval_args.get("y"), | ||
| ) | ||
|
|
||
| one_second = make_interval(seconds=1) | ||
| if window_sign > 0: | ||
| range_start, range_end = Window.CURRENT_ROW, interval | ||
| range_start = one_second | ||
| range_end = interval | ||
| else: | ||
| range_start, range_end = -interval, Window.CURRENT_ROW | ||
| range_start = -interval | ||
| range_end = -one_second | ||
|
|
||
| window_spec = ( | ||
| Window.partition_by(group_by) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this deserve a line in the CHANGELOG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.