This repository contains the implementation of my MSC Dissertation project on "Training AI for Information Security." The project utilizes machine learning algorithms for detecting and classifying cyber threats in network traffic, specifically employing transformer-based models for zero-shot classification tasks.
To install the project, follow these steps:
- Clone the repository:
git clone https://github.com/niting3c/AiPacketClassifier.git - Change directory to the cloned repository:
cd AiPacketClassifier - Install Conda if you haven't done so already. You can download it from here.
- Create a Conda environment using the provided
environment.ymlfile:conda env create -f environment.yml - Activate the Conda environment:
conda activate AiPacketClassifier
Note: This project has been tested on Python 3.9.5, and the required dependencies are listed in the environment.yml file.
Here are detailed descriptions of the main files in this repository:
-
run.py: This is the main script that initializes multiple zero-shot classification models from the Transformers library, processes input files with each model, and writes the results. It uses the following functions:load_models(): Loads the transformer models specified in themodels.pyfile and initializes the zero-shot classifiers.process_files(model_entry, directory): Processes pcap files in the givendirectoryusing the specifiedmodel_entry. This function callsanalyse_packet()andsend_to_llm_model()for each pcap file.
-
utils.py: This script contains helper functions to handle file-related operations such as creating file paths. It provides the following functions:create_result_file_path(file_path, extension=".txt", output_dir="./output/", suffix="model"): Generates a new file path for a result file in the output directory. Thefile_pathparameter specifies the original file path,extensionspecifies the desired file extension for the new file,output_dirspecifies the directory for the new file (default is "./output/"), andsuffixspecifies the extra folder inside the directory for easier segregation (default is "model").get_file_path(root, file_name): Generates a file path by combining the providedrootandfile_name.
-
promptmaker.py: This script includes functions that generate prompts for the classification tasks. These prompts help guide the AI in its analysis of packets and instruct it on how to report its findings. It provides the following function:generate_prompt(protocol, payload): Generates a formatted prompt with the specifiedprotocolandpayloadto be used as input for the transformer models.
-
pcapoperations.py: This script contains functions that handle pcap file operations, including reading packets from pcap files, analyzing packets using the zero-shot classification models, and writing the results to an output file. It provides the following functions:process_files(model_entry, directory): Processes pcap files in the givendirectoryusing the specifiedmodel_entry. This function callsanalyse_packet()andsend_to_llm_model()for each pcap file.analyse_packet(file_path, model_entry): Analyzes the packets in the pcap file located atfile_pathusing the specifiedmodel_entry. This function extracts the protocol and payload from each packet and prepares input objects for classification.extract_payload_protocol(packet): Extracts the payload and protocol from thepacket.send_to_llm_model(model_entry, file_name): Sends the prepared input objects to the ZeroShot model for classification and stores the results in themodel_entry.
-
llm_model.py: This script includes functions that handle the interaction with the transformer models. It prepares the inputs for the classifier, generates the classifier's response, and processes the response.
-
Make sure you have installed all necessary packages and activated the Conda environment (see Installation).
-
The
run.pyscript expects input files to be located in the./inputsdirectory. Make sure you have populated this directory with your pcap files for processing. -
To start the program, simply run:
python run.py -
The results will be written to the
./outputdirectory.
The project uses the following transformer models for zero-shot classification tasks:
- Deep Night Research's ZSC Text
- Facebook's BART Large MNLI
- Moritz Laurer's DeBERTa v3 base MNLI+FEVER+ANLI
- Sileod's DeBERTa v3 base tasksource NLI
Contributions are what make the open-source
community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Distributed under the MIT License. See LICENSE for more information.
Nitin Gupta - nitin.gupta.22@ucl.ac.uk
Project Link: https://github.com/niting3c/AiPacketClassifier
For specific requests or inquiries, feel free to contact me. Happy coding!
In this updated README file, I have provided more detailed explanations for each section, including function details and their usages. If you need any further improvements or additional information, please let me know!