fix: Validate SetStatisticsUpdate correctly#2866
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Thanks for the PR. I think we should just deprecate the top-level snapshot_id entirely.
For context, the "before model_validator` was added in 3b53edc#diff-769b43e1d8beaa86141f694679de2bbea3604a65f987a9acd7d9e9efca193b7eR181-R193 to maintain backwards compatibility and prep for deprecation
kevinjqliu
left a comment
There was a problem hiding this comment.
oops didnt mean to approve
|
@kevinjqliu Ok, do you want me to change the fix so that |
kevinjqliu
left a comment
There was a problem hiding this comment.
Looks like the java implementation has not deprecated the top-level snapshot_id. lets proceed with this change since it improves the current validation logic.
Thanks for the PR! Lets add a test and it should be good to go!
pyiceberg/table/update/__init__.py
Outdated
| @model_validator(mode="after") | ||
| def validate_snapshot_id(self) -> Self: | ||
| return self.model_copy(update={"snapshot_id": self.statistics.snapshot_id}) |
There was a problem hiding this comment.
| @model_validator(mode="after") | |
| def validate_snapshot_id(self) -> Self: | |
| return self.model_copy(update={"snapshot_id": self.statistics.snapshot_id}) | |
| @model_validator(mode="after") | |
| def validate_snapshot_id(self) -> Self: | |
| self.snapshot_id = self.statistics.snapshot_id | |
| return self |
nit: use direct assignment
6f2b0d7 to
aa85657
Compare
@kevinjqliu Thanks for the (quick!) review. I've changed the fix a bit:
Please let me know if you want further changes. |
00cfd92 to
fa456e3
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
Thanks! Looks good, i found a small bug. Would be great to add a test for this case
| elif isinstance(stats, dict): | ||
| snapshot_id = stats.get("snapshot_id") | ||
|
|
There was a problem hiding this comment.
| elif isinstance(stats, dict): | |
| snapshot_id = stats.get("snapshot_id") | |
| elif isinstance(stats, dict): | |
| snapshot_id = stats.get("snapshot_id") | |
| else: | |
| snapshot_id = None | |
nit: i think we can inline the else here
pyiceberg/table/update/__init__.py
Outdated
| if isinstance(stats, StatisticsFile): | ||
| snapshot_id = stats.snapshot_id | ||
| elif isinstance(stats, dict): | ||
| snapshot_id = stats.get("snapshot_id") |
There was a problem hiding this comment.
| snapshot_id = stats.get("snapshot_id") | |
| snapshot_id = stats.get("snapshot-id") |
i think this should be snapshot-id since before validator takes in json as input
iceberg-python/pyiceberg/table/statistics.py
Lines 32 to 40 in fa03e08
There was a problem hiding this comment.
could you add a test case for this (and possibly one for the else case too)?
The current test only test the StatisticsFile instance branch
iceberg-python/tests/table/test_init.py
Lines 1370 to 1381 in fa03e08
There was a problem hiding this comment.
good catch with the snapshot-id. I've fixed that, and added a test specifically for the snapshot_id handling, creating a model both from a model instance, dict and json. the roundtrip testing also exercises the previous bug.
ba3eb24 to
65e0f64
Compare
Previously the pydantic @model_validator would fail because it assumed statistics was a model instance. In a "before"" validator that is not necessarily the case. Check type explicitly with isinstance instead, and handle `dict` case too.
65e0f64 to
e2517b0
Compare
Closes #2865
Previously the pydantic
@model_validatoronSetStatisticsUpdatewould fail because it assumed statistics was a model instance. In a "before"" validator that is not necessarily case.Check type and handle both models and dicts instead.
Before
After
Rationale for this change
Are these changes tested?
Yes, but only using the two-liners above.
Are there any user-facing changes?
No.