Skip to content

parse stream? #10

@Laeeth

Description

@Laeeth

Hi Marco.

Small enhancement request. (Apologies if it's implemented already and I didn't see).

Quite often one wants to parse a JSON stream (like from Twitter or the Reddit comment dump). It would be nice to have that implemented as part of the library, so it's very easy to use. I have written a small range to do this, but it's quite crude, and I haven't paid attention to efficiency. I can make a pull request if you would like (and you can refine it later), but you may prefer to implement yourself - let me know.

Here is some very simple code to process Reddit comments:
https://gist.github.com/Laeeth/bbd08dd576cb7aeff444

The original comments are here:
https://archive.org/details/2015_reddit_comments_corpus

On one core it takes 35 minutes to process one month's data (35 Gig).

Thanks for getting in touch by email. That was about something else - have had to figure out some other things but will respond shortly.

Laeeth.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions