Skip to content

FSTOrdPostingsFormat could enable faster Tagger #79

@dsmiley

Description

@dsmiley

The Lucene FSTOrdPostingsFormat (Solr schema postingsFormat="FSTOrd50") Is like FSTPostingsFormat but has "ordinals" -- term ordinals. Ordinals are not supported by most postings formats but this one has it. In TermPrefixCursor.java I left a comment that it could be more efficient we we could use ordinals. I think this might be true. Instead of eagerly reading & caching the postings (list of docIDs), we could just capture the ordinal (an int). This'd replace some of the "IntsRef" with this integer ordinal. TPC wouldn't need docIdsCache either. Later when we resolve it in getDocIds(), that's when we do the actual work which is perhaps not expensive. Sometimes we're never consulted to even do that, thus saving some time. The tag may have been eliminated due to overlapping, or it may have effectively been cached at a higher level (TaggerRequestHandler transforms to the uniqueKey values then caches that).

I'm not sure how much benefit this would bring; it could be net loss; hard to be sure.

Down side is we'd basically be limited to this PostingsFormat. At least the PostingsWriterBase aspect of this one is pluggable (kinda) should we want some future improvements to allow a total in-memory option. To ameliorate this down-side, we could support any PF via grabbing the "TermsState" instead, and presumably the termState of FSTOrdPostingsFormat is effectively the ordinal.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions