Skip to content

Reporting errors in ndjson streams #1323

@pflanze

Description

@pflanze

Context

ndjson is just lines of text representing a json document each. If such a file is cut off because the disk is full (browsers don't tell well or at all when that happens) and that happens at a line boundary then there's no way to know.

ndjson over HTTP: when sent with length in advance (not streaming), there's a chance to detect when not complete. When streaming, this chance still exists when chunked transfer is used, but not when "connection: close" is used.

ndjson over websockets: not sure.

Things like server or service restarts (e.g. SILO, LAPIS, nginx), router reconfigurations, maybe other things could lead to premature closing of a connection.

Related is the problem that when the server encounters an error late while producing a streamed response, there's even less possibility to signal that via the transfer protocol (lines of json within TCP or HTTP); the HTTP header has already been sent as 200 OK, just closing the connection doesn't do.

Approaches

Possible approaches, of which the 4th is currently preferred:

  1. In principle there exists a concept of "trailer headers" (https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Trailer), which is sending more headers right after ending a chunked response with the last chunk, but it's not actually supported by browsers and probably not much else either, hence seems useless here.

  2. Do not do streaming, but instead have an API that delivers sized chunks, and have the client request chunk after chunk. Requires the database to correlate API call with query result (and clean up the correlation after a timeout), or be fast in recalculating the result set every time.

  3. Not use ndjson, but an encoding with an implicit (start and) end boundary, like normal JSON ([ {..}, {..}, .. ]) and a streaming parser, but that creates a need to use a streaming-capable parser and gotchas when users don't know how to use it (e.g. even a streaming parser may be non-streaming by default), also doesn't allow the transmission of errors.

  4. Make every line of ndjson able to represent a data point, legit end marker, or error: {"type": "data", "payload": ... } for a data point, {"type": "error", "error": ...} when an error comes up (possibly implying the end of the response), and {"type": "eof"} or {"type": "ok"} to mark the complete and successful ending of the response. Chaoran mentions that "Our normal LAPIS JSON responses also have the structure of {data: ... } and {error: ...} and it makes a lot of sense if the NDJSON objects also have the format".

  5. If the ndjson format is fixed because some tooling requires it (i.e. people want to download a file and then load it into an existing tool), then 4 can't be done. Alternatively, use a lone string as the ndjson data point to represent an error:

    {"foo":"abcd","bar":1234}
    {"foo":"efgh","bar":5678}
    "ERROR: SILO exception: arrow: array is too long, can only be 2 GiB"
    

    and if there is no error, end the document with an empty line (i.e. \n\n, hoping the existing tooling accepts that).

    If the receiver tries to blindly access field "foo" or "bar" on that error string value, it will get undefined, and some error downstream, which should at least raise awareness. If the developer is prepared for errors (because we document it clearly) then typeof obj == "string" allows checking for them.

  6. Do not use streaming, but instead prepare the whole result on the server, then compress it with zstd, send the result. This might reduce download time even when including the time it takes for the preparation step--definitely if zstd were the bottleneck, but it won't be. This approach would avoid the streaming error issue with inflexible formats like text/x-fasta, too.

  7. Still do streaming but with zstd compression (not on the transport protocol (HTTP), but the user data level); this at least allows to check for completeness of the result, since zstd will detect when the compressed stream misses the end. Maybe using a hack of stopping the zstd stream in case of error but appending a couple KB of whitespace then an error message might give the user a chance see the error message, too.

  8. Provide a custom tool to download files, that uses aproach 2 or some other way (e.g. zstd-compressed stream multiplexed in a custom chunked format) to retrieve the result. (JavaScript in browser, Rust, Python.)

Where

The connection between SILO and LAPIS could use arrow instead; but perhaps direct users of SILO want some solution too, thus perhaps implement the same in both services.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions