-
Notifications
You must be signed in to change notification settings - Fork 0
rayan589/fraud_detection
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
# Project Overview This project demonstrates a machine learning pipeline for fraud detection using two approaches: ## Python, Scikit-learn, and Pandas for data augmentation and model training. ## PySpark for data preprocessing and training machine learning models on large datasets. # File Descriptions ## File 1: data_augmentation_and_ml_models.ipynb Contains the implementation of data augmentation using bootstrap sampling with Gaussian noise. Includes machine learning model training and evaluation using libraries such as Scikit-learn and Pandas. ## File 2: data_preprocessing_and_pyspark_training.ipynb Focuses on data preprocessing and machine learning model training using PySpark. Designed to handle large datasets efficiently, showcasing the importance of big data tools. ## csv file of the original dataset (1k rows) ## csv of the augmented dataset (10k rows) ## .txt file containing the link for the augment 100k rows dataset and the augmented 1M rows dataset # Key Insight This approach highlights that as datasets grow in size, the use of big data tools like PySpark becomes crucial for scalability and performance.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published