Conversation
|
Thank you for all of this! Some comments: Could you say more about the Lexer -> Pattern shift in CharSequence2TokenSequence? It looks like the validateTopics function is adding stopwords during training? Is there a reference for this? I'm reluctant to make something available without fully understanding when users should and shouldn't use it. I'm planning to release the HPPC version as 2.1, I'd like to see this as part of it. |
|
Hi David, To make the About the validateTopics function, the idea is to create a list of stopwords, in an iterative way, based on those words appearing as top-words in multiple topics. This is similar to apply TF/IDF on Topics instead of Documents. I hope it was helpful. |
Minor changes in serialisation process and added a method to get top words along with their weights per topic