I was thinking of creating a variant of the generator that uses a cache to store the most frequently occurring word sets, and their corresponding list of following words. This approach would not take as much space as creating the entire word-map, while being much faster than scanning through the entire text each time. I'd like to work on this if it sounds good. Let me know what you think.