Skip to content

Single-word lines do not form events #177

@kuchenrolle

Description

@kuchenrolle

What you were trying to do

Turn a one-sentence-per-line corpus into events, where some lines are only one word (headings).

What actually happened

The single-word lines do not form events.

How to reproduce

The problem is in the generation of occurrences. This is what happens there:

occurrences = list()
words = ["test"]
before = 2
after = 1
for ii, word in enumerate(words):
    # words before the word to a maximum of before
    cues = words[max(0, ii - before):ii]
    # words after the word to a maximum of before
    cues.extend(words[(ii + 1):min(len(words), ii + 1 + after)])
    # append (cues, outcomes)
    occurrences.append(("_".join(cues), word))

The loop has only one iteration, in which the word is the outcome without any cues, so it is dropped.

This is not necessarily a bug, but it was at least surprising to me. I would expect (want) that the word is treated as a cue that predicts no outcome. I only noticed this randomly, when a cue was dropped, because it occurred exclusively in those circumstances. I haven't checked, but I'm assuming this holds for context structures other than "line" as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions