This repository contains a collection of Machine Learning projects, covering various domains such as fraud detection, financial sentiment analysis, and more. Each project is self-contained, demonstrating a specific ML/AI concept with clear implementations and results.
This project implements a credit card fraud detection system using Support Vector Machines (SVM) and Principal Component Analysis (PCA). The model analyzes financial transactions to classify them as fraudulent or authentic, helping mitigate risks in digital financial systems.
- Machine Learning: SVM, PCA
- Libraries: Scikit-learn, NumPy, Pandas, Matplotlib
- Dataset: Financial transaction records with fraud labels
- Dimensionality Reduction: PCA improves model efficiency.
- Fraud Classification: SVM handles high-dimensional transaction data.
- Data Preprocessing: Balanced dataset using oversampling for better fraud detection.
Fraudsters are categorized based on their corporate and community level roles.
To reduce dimensionality, PCA was applied, and the scree plot below shows the eigenvalues of each principal component.
Our pipeline standardizes the data, applies PCA for feature selection, and then uses an SVM classifier to predict fraudulent transactions.
- Model Accuracy: 72.63% (test), 72.93% (train)
- Fraudulent Transactions Identified: Mainly found in Transfer & Cash-Out transactions.
- Dimensionality Reduction Success: PCA helped optimize performance while retaining fraud detection accuracy.
This project implements a news sentiment analysis application using the DistilRoBERTa model fine-tuned for financial news sentiment analysis, accessible via the Hugging Face API. The model classifies financial texts, such as market reports and news articles, into different sentiment categories to help users analyze the market sentiment.
- Machine Learning: DistilRoBERTa (fine-tuned for financial sentiment analysis)
- Libraries: Hugging Face Transformers, Flask, PostgreSQL
- Deployment: Flask API, hosted on Heroku
- Real-Time Sentiment Analysis: Uses Hugging Face API for instant results.
- Financial-Specific Model: Trained on financial news to improve accuracy in economic contexts.
- Web Application Interface: Built using Flask, allowing users to input text and receive real-time analysis.
- User inputs financial text (e.g., a market report or company earnings statement).
- The text is sent to the Hugging Face API, which classifies sentiment as positive, negative, or neutral.
- The results are displayed in a user-friendly interface.
The pretrained model used for this task: 🔗 DistilRoBERTa fine-tuned for financial sentiment analysis
This project develops a credit scoring model using various machine learning techniques to predict an individual's creditworthiness based on financial and business data. The model leverages XGBoost, Random Forest, Neural Networks, and other algorithms to enhance prediction accuracy.
- Machine Learning: XGBoost, Random Forest, Neural Networks, SVM, KNN, Linear Regression
- Libraries: Scikit-learn, Pandas, NumPy, Matplotlib
- Data Processing: One-hot encoding, Label Encoding, StandardScaler for normalization
- Automated Data Processing: Handles missing values, categorical data encoding, and numerical transformations.
- Multiple Model Evaluation: Compares various models using MSE, RMSE, MAE, R-squared, and Adjusted R-squared.
- Optimal Model Selection: Identifies the most accurate model for credit score prediction.
- Top Performing Model: XGBoost with the highest R-squared score of 96.78%.
- Feature Normalization Success: StandardScaler helped improve model convergence and accuracy.
- Regression Task Optimized: Removed inappropriate evaluation metrics (e.g., F1-Score) since the task is continuous rather than classification-based.
Below is a comparison table showing the evaluation metrics for different models tested in this project:
The table highlights the accuracy of different machine learning models used for credit scoring. XGBoost outperforms other models with the lowest Mean Squared Error (MSE) and Root Mean Squared Error (RMSE), indicating its high precision. Random Forest and Neural Network models also show strong performance. On the other hand, K-Nearest Neighbors (KNN) has the highest error rates, making it the least suitable for this task.
This project develops a Crop Recommendation System using Machine Learning techniques to analyze environmental conditions like temperature, humidity, rainfall, and soil nutrients and suggest the best crops for cultivation.
- Machine Learning: K-Means Clustering, SVM Classification
- Libraries: Scikit-learn, Pandas, NumPy, Matplotlib, Seaborn
- Deployment: Flask API for real-time predictions
-
Exploratory Data Analysis (EDA):
- Provides statistical summaries and average environmental requirements for different crops.
- Identifies suitable crops for different seasons (Summer, Winter, Rainy).
-
Clustering with K-Means:
- Determines optimal clusters using the Elbow Method.
- Groups crops based on environmental conditions and soil nutrients.
-
Crop Classification using SVM:
- Achieves 97% accuracy in predicting the best crop based on given environmental conditions.
-
Model Deployment:
- Saves the trained SVM model as a joblib file for deployment in a Flask environment to make real-time predictions.
- Preprocesses the dataset by standardizing environmental data.
- Applies K-Means clustering to group similar crops.
- Trains an SVM classifier to recommend the best crop.
- Deploys the trained model via a Flask API for real-time crop prediction.
- Clustering Analysis: Groups crops into different clusters based on environmental conditions.
- Classification Model: SVM model achieves 97% accuracy in crop prediction.
- Deployment: The model is saved and can be used in a Flask API for real-world applications.



