A Bayesian POS Tagger in Perl 5. Algorithms implemented:
- Naive Bayes
- Complete Naive Bayes (CNB)
- Gibbs Sampling: random extraction, max frequent on to_sample samples and max on position
Idea and algorithm described in the following papers:
- For CNB: Part of Speech Tagging with Naïve Bayes Methods
- For Gibbs Sampling(Idea for the approach): Bayesian Analysis for Natural Language Processing
The dataset used is the Brown's corpus. You can find it here. The meaning of the tags can be found here.