Skip to content

Make the data format less insane #27

@bvisness

Description

@bvisness

The iongraph format is extremely, extremely verbose. At the very least, we should consider making a JSON format that has shorter property names, but we could also consider a binary format.

There is something nice about sticking with JSON, given that it makes it much easier to throw at tools and get basic info out of, but the big downsides are that the files can be very large, and they are difficult to navigate efficiently. Want to parse just function 80 out of 100? Too bad, you have to parse all the previous functions too.

That said, most of the data in an iongraph.json file is strings and small numbers. JSON is not too wasteful in that case (although numbers may still be slower to parse). You still have a fair amount of "structural waste" from all the opening and closing braces; maybe that is ok. The "attributes" property is more verbose (a Real and Proper format would probably have string interning) but most blocks don't even have attributes anyway.

As for navigability, you could argue that it's an issue that should be fixed at the source. If you aren't interested in 99 out of 100 functions, maybe your tooling should only emit iongraph JSON for one function. But also, it may be useful to have larger snapshots of how an entire program was compiled, and if file size is tolerable, I could see myself making use of this. But then you run into long parse times and high memory requirements from handling such large JSON data. Two solutions come to mind: 1) Just count braces to skip to the desired function, and 2) insert an (optional?) acceleration field at the end of the file with byte offsets and sizes of each function in the preceding JSON. This would be information that an emitter could easily track while emitting, and a tool could simply iterate backwards from the end of the file to find the start of this record.

Anyway, there's some thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions