Skip to content

Overlays don't work well with haplotype paths #158

@adamnovak

Description

@adamnovak

Right now, overlays like PackedPathPositionOverlay will index all the paths that for_each_path_handle() returns.

For backward compatibility, I have for_each_path_handle() omitting haplotype paths, at least for graphs like GBWTGraph where there are thousands of them, and you need to use the more advanced PathMetadata search methods to enumerate haplotype paths.

But this means that you can put a PackedPathPositionOverlay over a GBWTGraph, get a handle to a haplotype path by name, but then ask about positions on it when it hasn't actually been indexed during construction of the overlay. This in turn is going to break e.g. vg inject into a haplotype path.

The overlays need to be modified to handle haplotype paths. Either they need to be enumerated and processed (which is probably too slow), or they need to be detected and excluded (which is kind of useless because people probably want to be able to use them) or they need to be lazily indexed by the overlays (which might be inefficient but at least avoids indexing thousands of samples to inject into one).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions