- Stores news in "data" folder in json format
- Splits the list of news in n-chunks for further reading and tokenizing
- Creates n temporary files where news of every chunk are written
- Opens and groups all temporary files to tokenize and write them in plain text
- Writes every token in a json file as list
- Adds the gramathical cathegory to every word
Example in freqs folder
- Counts the times that a word with a given tag appears in the corpora