Skip to content

[.] appearing in (scrubbed?) content fields #3

@jeremydouglass

Description

@jeremydouglass

An example is in the content_scrubbed of a Reddit comment:

Maybe you mispoke, and that[.] fine, but don’t please don’t get pissy at me for taking what you wrote at face value. ... The first modern American research university was Johns Hopkins, which is based closely on the Humboldtian model. Humboldt[.] idea for the university put a big focus on research, required all students to have a strong foundation in the humanities (philosophy, history, literature, the arts).

All of the Reddit data is in content_scrubbed -- we need to find examples in the LexisNexis data to determine whether this was introduced by out scrubbing, or by some python unicode issue, or what (and however so, where and how it should be fixed).

We want to watch out for applying a replace fix blindly, as that might be breaking the editorial [.] in legitimate examples:

This recalls Lincoln's thoughts on "Four-score and seven[.]"

Related discussion:
https://we1s.ryver.com/index.html#posts/2109151

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions