Skip to content

SiddDevCS/PhishingML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Phishing ML


Intro

This is a project I am working on to introduce myself to the world of Machine Learning, using scikit-learn.

NOTE: This project is still in progress.


What?

I am training a Machine Learning model to predict whether a given URL is a phishing link.


Get started

  1. Clone the repo:
git clone https://github.com/SiddDevCS/PhishingML.git
  1. Go to the project folder:
cd phishing-ml/
  1. Set up venv (virtual environment)
python3 -m venv venv
source venv/bin/activate
  1. Install libraries
pip install -r requirements.txt
  1. Load model into models/ directory
python3 notebooks/train_model.py

Note: if there is already a model in model/ delete the new model made with train_model.py

  1. Finally set up the Flask web app
python3 app/app.py
  1. Visit the web app in your browser at: http://127.0.0.1:5000/

Workflow

  1. Datasets in JSON, for the model to be trained on.
  2. tldextract categorizing/splitting up the link given.
  3. Training the ML model.
  4. The ML model giving output if the link given is a phishing link or not.

Project Structure:

phishing-detector/
├── data/
├────── fetch.py                # Script to fetch phishing datasets (make sure to use VPN, to not get blocked)
├────── phish-data.json         # JSON datasets
├── notebooks/              
├────── load_data.py            # loads JSON into dataframe
├────── train_model.py          # trains/creates model in models/ dir
├── models/      
├────── phishing_model.pkl      # Trained ML model
├── app/
├────── app.py
├────── extract_features.py     # tldextract splitting up link
├────── static/
├───────────── style.css        # UI
├────── templates/
├───────────── index.html       # UI
├── README.md
└── requirements.txt

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

ML model detecting phishing URLs using scikit-learn and tldextract.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published