-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi Marco.
Small enhancement request. (Apologies if it's implemented already and I didn't see).
Quite often one wants to parse a JSON stream (like from Twitter or the Reddit comment dump). It would be nice to have that implemented as part of the library, so it's very easy to use. I have written a small range to do this, but it's quite crude, and I haven't paid attention to efficiency. I can make a pull request if you would like (and you can refine it later), but you may prefer to implement yourself - let me know.
Here is some very simple code to process Reddit comments:
https://gist.github.com/Laeeth/bbd08dd576cb7aeff444
The original comments are here:
https://archive.org/details/2015_reddit_comments_corpus
On one core it takes 35 minutes to process one month's data (35 Gig).
Thanks for getting in touch by email. That was about something else - have had to figure out some other things but will respond shortly.
Laeeth.