-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
Description
What you were trying to do
Turn a one-sentence-per-line corpus into events, where some lines are only one word (headings).
What actually happened
The single-word lines do not form events.
How to reproduce
The problem is in the generation of occurrences. This is what happens there:
occurrences = list()
words = ["test"]
before = 2
after = 1
for ii, word in enumerate(words):
# words before the word to a maximum of before
cues = words[max(0, ii - before):ii]
# words after the word to a maximum of before
cues.extend(words[(ii + 1):min(len(words), ii + 1 + after)])
# append (cues, outcomes)
occurrences.append(("_".join(cues), word))
The loop has only one iteration, in which the word is the outcome without any cues, so it is dropped.
This is not necessarily a bug, but it was at least surprising to me. I would expect (want) that the word is treated as a cue that predicts no outcome. I only noticed this randomly, when a cue was dropped, because it occurred exclusively in those circumstances. I haven't checked, but I'm assuming this holds for context structures other than "line" as well.