Skip to content

Output mmtf uses 64bit floats which violates the mmtf specification. #50

@zacharyrs

Description

@zacharyrs

The specification outlines the float type as 32bit. Python has 64bit floats, hence when packing these per the template are dumped to the output file. Other parsers (e.g. mmtf-java) try to load these as 32bit floats, and hence fail. We can overcome this easily by updating the msgpack.packb call to include use_single_float=True.

However, it seems mmtf-java also violates the standard, and uses doubles (64bit floats) for the ncsOperatorList, thus the above change means it can't parse the output still. Given mmtf-java is used for the RCSB files, we can assume they won't shift to 32bit floats - it'll break their parsing for even more files.

Additionally, the msgpack-python implementation does not support selecting doubles for only one field - msgpack/msgpack-python#326. Instead you have to pack the biological assemblies list separately and then combine it, as in the collapsed snipped below.

Code for packing separately.
# The mmtf standard expects everything as 32bit - hence use_single_float.
# Note the encode_data no longer includes bioAssemblyList.
main = msgpack.packb(self.encode_data(), use_bin_type=True, use_single_float=True)

# Assemblies need to be 64bit for Java compatibility.
assemblies = msgpack.packb(
    {"bioAssemblyList": self.bio_assembly},
    use_bin_type=True,
    use_single_float=False,
)

# In msgpack, the first three bytes of a map (over 15 elements) are `\xde\x12\x34`, where
# 1234 gives the map length.

# Our `main` map has 30-something elements, hence only the `\x34` matters.

# Get the new length indicator, prepended with the map indicator and a `\x00`.
new_map_length: bytes = b"\xde\x00" + chr(main[2] + 1).encode()

# Strip the first three bytes from `main` (the map indicator byte and two bytes for length).
main = main[3:]

# Strip the first byte from `assemblies` (it's less than 15 elements, has a single byte indicator).
assemblies = assemblies[1:]

# Finally put it all back together.
new_data = new_map_length + main + assemblies

For reference I have raised this issue in the mmtf-java repo too - rcsb/mmtf-java#53.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions