Fix: use new snapshot id in deleted manifest entry unless is existing entry#2266
Conversation
tests/table/test_delete.py
Outdated
|
|
||
| def assert_manifest_entry(expected_status: ManifestEntryStatus, expected_snapshot_id: int) -> None: | ||
| current_snapshot = table.refresh().current_snapshot() | ||
| manifest_files = current_snapshot.manifests(table.io) |
There was a problem hiding this comment.
Run a make lint here the snapshot calls are a union with null and require an assertion or check if they exist before calling them
There was a problem hiding this comment.
Thanks for checking! I see the CI failed with lint as well, I'll fix it in the next commit
tests/table/test_delete.py
Outdated
| pass | ||
|
|
||
|
|
||
| def test_manifest_entry_after_deletes(catalog: Catalog) -> None: |
There was a problem hiding this comment.
We can probably add this test to the existing test_deletes.py suite
Fokko
left a comment
There was a problem hiding this comment.
Thanks @lliangyu-lin for raising this PR, it looks indeed that we have a bug here 👍
Co-authored-by: Fokko Driesprong <fokko@apache.org>
|
@kevinjqliu Hi Kevin, could you help re-enable the CI for this PR again? |
kevinjqliu
left a comment
There was a problem hiding this comment.
this makes sense, thanks for the PR!
i was looking at all the places where ManifestEntry.from_args is used are there potentially other places we also need to change?
|
There are 2 other usages of
All other references of Looks like we covered all the cases, WDYT? |
Thanks @kevinjqliu for checking! I took a look and I think you are right that both of the cases are setting the snapshot id correctly. LGTM. |
|
Thanks for the PR @lliangyu-lin and thanks for reviewing @Fokko @geruh |
Rationale for this change
Based on iceberg spec, when a manifest entry is marked as deleted, the snapshot id for when the entry was deleted should be used.
https://iceberg.apache.org/spec/?h=deletes#manifest-entry-fields
https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestWriter.java#L178-L179
Incorrect snapshot id could lead to data being deleted during garbage collection when not supposed to.
Are these changes tested?
Added in
test_deletes.pyAre there any user-facing changes?
No