Avro: Fix tests and add missing content header#2265
Conversation
pyiceberg/manifest.py
Outdated
| "parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | ||
| "sequence-number": str(sequence_number), | ||
| "format-version": "2", | ||
| "content": "data", |
There was a problem hiding this comment.
This header is missing, and should be set to data until we support MoR deletes.
There was a problem hiding this comment.
ah interesting. Looks like currently the ManifestWriterV2 appends the "content": "data",
iceberg-python/pyiceberg/manifest.py
Line 1165 in 904c0b7
There was a problem hiding this comment.
There was a problem hiding this comment.
nit: should we remove the "content": "data", from ManifestWriterV2 ? The content field is part of the manifest list, not the manifest file
While working on apache#2004 I've noticed some small discrepancies that I think would be good to address in a separate PR.
d15ec47 to
382a548
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
cool! i was able to use apache/iceberg-rust#1328 to read both the manifest file and the v2 manifest list file
pyiceberg/manifest.py
Outdated
| "parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | ||
| "sequence-number": str(sequence_number), | ||
| "format-version": "2", | ||
| "content": "data", |
There was a problem hiding this comment.
ah interesting. Looks like currently the ManifestWriterV2 appends the "content": "data",
iceberg-python/pyiceberg/manifest.py
Line 1165 in 904c0b7
pyiceberg/manifest.py
Outdated
| "parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | ||
| "sequence-number": str(sequence_number), | ||
| "format-version": "2", | ||
| "content": "data", |
There was a problem hiding this comment.
e980caa to
306032b
Compare
306032b to
63c4504
Compare
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! I tested with apache/iceberg-rust#1328. I had to pad the metric values to be exactly 8 bytes (and pushed the change)
I tested both the manifest file fixture (generated_manifest_entry_file) and the v2 manifest list fixture (generated_manifest_file_file_v2)
The v1 manifest list fixture (generated_manifest_file_file_v1) failed with
Source: Failed to deserialize Avro value into value: missing field `content`
which is a known issue tracked in apache/iceberg-rust#1576
I left a small nit for the content field currently in ManifestWriterV2
pyiceberg/manifest.py
Outdated
| "parent-snapshot-id": str(parent_snapshot_id) if parent_snapshot_id is not None else "null", | ||
| "sequence-number": str(sequence_number), | ||
| "format-version": "2", | ||
| "content": "data", |
There was a problem hiding this comment.
nit: should we remove the "content": "data", from ManifestWriterV2 ? The content field is part of the manifest list, not the manifest file
… into fd-fix-small-things
|
Thanks @kevinjqliu 🙌 |
While working on apache#2004 I've noticed some small discrepancies that I think would be good to address in a separate PR. <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Kevin Liu <kevin.jq.liu@gmail.com>
Follow-up of apache#2265 where some partition fields were renamed to avoid naming conflicts but the fixtures were not updated properly.
Follow-up of apache#2265 where some partition fields were renamed to avoid naming conflicts but the fixtures were not updated properly.
Follow-up of #2265 where some partition fields were renamed to avoid naming conflicts but the fixtures were not updated properly. <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change ## Are these changes tested? ## Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->

While working on #2004 I've noticed some small discrepancies that I think would be good to address in a separate PR.
Rationale for this change
Are these changes tested?
Are there any user-facing changes?