-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi authors, thanks a lot for releasing the code and dataset links — this is a really solid and practical piece of work.
I downloaded hutchinsonian/droid_processed from ModelScope, but most files under droid_processed_data_tar/ look incorrect.
No matter which method I use, almost all droid_processed_data_XXXX.tar.zst fail with:
zstd: unsupported format
and in the rare case zstd is recognized, it can still be truncated: Read error: premature end
I've tried these methods as README provided as follows,
Git LFS: git lfs clone https://www.modelscope.cn/hutchinsonian/droid_processed.git data/droid/_modelscope_droid_processed
ModelScope SDK / snapshot_download:
from modelscope.hub.snapshot_download import snapshot_download
snapshot_download(
repo_id="hutchinsonian/droid_processed",
repo_type="dataset",
revision="master",
local_dir="data/droid/_modelscope_droid_processed"
)
Same result for both.
I checked the first 4 bytes of all *.tar.zst files. Zstandard should start with magic 28 b5 2f fd, but in my case:
Total shards: 188
ZSTD magic matched: 1
Magic mismatch: 187
So most files are named .tar.zst but don’t seem to be Zstandard files at all.
Is the dataset currently incomplete/corrupted on ModelScope?
If yes, could you re-upload/fix the tar shards, or share a verified alternative download (or checksums) that you know works?
Thanks again — I really appreciate the open-source release.