PD-Webpage-Classifier

Creates a training set and uses supervised learning (text classification) to build a model which predicts whether a webpage is relevant or not relevant based on features extracted from the website URL and title of the page.

Notes on Current Training Set

The current training set includes 1271 total samples, 700 of which are relevant and 571 of which are not relevant. This training set currently yields an F1 score of .84 for the "relevant" class.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
AccuracyCalc.py		AccuracyCalc.py
PlotAccuracy.py		PlotAccuracy.py
README.md		README.md
TrainingSetCreator.py		TrainingSetCreator.py
WebpageClassifier.py		WebpageClassifier.py
balancedTrainingSet.csv		balancedTrainingSet.csv
multinomial_classifier.pkl		multinomial_classifier.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PD-Webpage-Classifier

Notes on Current Training Set

About

Uh oh!

Releases

Packages

Languages

PlanetaryDefense/PD-Webpage-Classifier

Folders and files

Latest commit

History

Repository files navigation

PD-Webpage-Classifier

Notes on Current Training Set

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages