This is a project I am working on to introduce myself to the world of Machine Learning, using scikit-learn.
NOTE: This project is still in progress.
I am training a Machine Learning model to predict whether a given URL is a phishing link.
- Clone the repo:
git clone https://github.com/SiddDevCS/PhishingML.git- Go to the project folder:
cd phishing-ml/- Set up venv (virtual environment)
python3 -m venv venv
source venv/bin/activate- Install libraries
pip install -r requirements.txt- Load model into models/ directory
python3 notebooks/train_model.py- Finally set up the Flask web app
python3 app/app.py- Visit the web app in your browser at: http://127.0.0.1:5000/
- Datasets in
JSON, for the model to be trained on. tldextractcategorizing/splitting up the link given.- Training the ML model.
- The ML model giving output if the link given is a phishing link or not.
phishing-detector/
├── data/
├────── fetch.py # Script to fetch phishing datasets (make sure to use VPN, to not get blocked)
├────── phish-data.json # JSON datasets
├── notebooks/
├────── load_data.py # loads JSON into dataframe
├────── train_model.py # trains/creates model in models/ dir
├── models/
├────── phishing_model.pkl # Trained ML model
├── app/
├────── app.py
├────── extract_features.py # tldextract splitting up link
├────── static/
├───────────── style.css # UI
├────── templates/
├───────────── index.html # UI
├── README.md
└── requirements.txt
This project is licensed under the MIT License - see the LICENSE file for details.