Fault Description to Complaint Classification using BERT

This project aims to classify automotive fault descriptions into predefined complaint categories using a BERT-based model. It also extracts root causes and failure modes from the fault descriptions.

Project Overview

Data Preprocessing:
- Extract keywords from fault descriptions.
- Identify root causes and failure modes.
Model Training:
- Tokenize the input data using BERT tokenizer.
- Train a BERT model for sequence classification on the preprocessed data.
- Save the trained model and tokenizer.
Prediction:
- Load the trained model and tokenizer.
- Predict complaints for new fault descriptions.
- Save predictions to an Excel file.
Online Learning:
- Retrain the model with new predictions to improve accuracy over time.

Installation

Prerequisites

Python 3.6+
PyTorch
Transformers (Hugging Face)
SpaCy
NLTK
pandas
openpyxl

Setup

Clone the repository:

git clone https://github.com/jatintop/fault-description-classification.git
cd fault-description-classification

Install dependencies:

pip install -r requirements.txt
python -m spacy download en_core_web_lg

Set up NLTK stemmer:

import nltk
nltk.download('snowball_data')

Usage

Data Preprocessing

The script preprocesses fault descriptions by extracting keywords, root causes, and failure modes.

import spacy
from nltk.stem.snowball import SnowballStemmer

# Initialize SpaCy and Snowball Stemmer
nlp = spacy.load("en_core_web_lg")
stemmer = SnowballStemmer(language='english')

def extract_keywords(text):
    doc = nlp(text)
    keywords = [stemmer.stem(token.text) for token in doc if not token.is_stop and token.is_alpha]
    return ' '.join(keywords)

def extract_root_cause(text):
    # Root cause extraction logic
    pass

def extract_failure_mode(text):
    # Failure mode extraction logic
    pass

Model Training

Train a BERT model using fault descriptions and complaint categories.

from transformers import BertTokenizerFast, BertForSequenceClassification, Trainer, TrainingArguments
import torch

# Tokenize and encode data
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
encodings = tokenizer(keywords, truncation=True, padding=True, max_length=128, return_tensors='pt')
labels = torch.tensor([label_dict[label] for label in complaints])

# Split data into training and validation sets
# Create datasets and train the model

Prediction

Load the trained model and tokenizer to predict complaints for new fault descriptions.

def predict_complaint(FaultD):
    keywords = extract_keywords(FaultD)
    inputs = tokenizer(keywords, return_tensors='pt', truncation=True, padding=True, max_length=128)
    outputs = model(**inputs)
    predicted_class = torch.argmax(outputs.logits, dim=1).item()
    complaint = index_to_label[predicted_class]
    return complaint

# Load new data and predict complaints

Online Learning

Retrain the model with new predictions to adapt to new data.

def retrain_model(new_data_df):
    # Combine new data with existing training data
    # Retrain the model
    trainer.train()
    model.save_pretrained(output_dir)
    tokenizer.save_pretrained(output_dir)

Saving Results

Save the predictions along with root causes and failure modes to an Excel file.

new_df['Prediction'] = new_df['FaultD'].apply(predict_complaint)
new_df['Root Cause'] = new_df['FaultD'].apply(extract_root_cause)
new_df['Failure Mode'] = new_df['FaultD'].apply(extract_failure_mode)
new_df.to_excel(output_file_path, index=False)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Contact

For any queries or suggestions, please contact jatintopakar@yahoo.com.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
templates		templates
LICENSE.md		LICENSE.md
Procfile.txt		Procfile.txt
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fault Description to Complaint Classification using BERT

Project Overview

Installation

Prerequisites

Setup

Usage

Data Preprocessing

Model Training

Prediction

Online Learning

Saving Results

License

Acknowledgements

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jatintop/deeplearningmodel

Folders and files

Latest commit

History

Repository files navigation

Fault Description to Complaint Classification using BERT

Project Overview

Installation

Prerequisites

Setup

Usage

Data Preprocessing

Model Training

Prediction

Online Learning

Saving Results

License

Acknowledgements

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages