Skip to content

Kashvi05agarwal/Applied-Machine-Learning-Case-Study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RootedRise-ML — Applied Machine Learning Case Study

Problem

Given a work of literary fiction, can we build a model to identify whether statements about specific characters are consistent with the narrative?

This is a text classification task: for each statement and character pair, predict whether the statement contradicts established facts in the story, or aligns with them.

Domain: Natural Language Understanding, Literary Reasoning
Task: Binary Classification (Consistent / Contradicts)


Approach

Data Pipeline

  • Train/Test Split: CSV-based labeled statements with source documents (novels)
  • Preprocessing: Character-level tokenization; sequential chunk processing with overlap
  • Book State Encoding: Build long-form narrative representations by processing entire texts in chunks

Model Architecture

A custom transformer-like architecture with:

  • Multi-head attention-like mechanisms over token sequences
  • State-mediated gating to track evolving narrative beliefs
  • Sparse latent representations to capture textual patterns
  • Consistency scoring based on state divergence

Training Pipeline

  1. Initialize model with standard PyTorch configuration
  2. Load training data from CSV
  3. Build book-level narrative states by processing source texts
  4. Train on labeled examples using cross-entropy loss
  5. Evaluate on held-out test set with threshold-based predictions
  6. Save results to results.csv

Hyperparameters: 4 layers, 128 embedding dim, 4 attention heads, AdamW optimizer (lr=3e-4)


Evaluation

  • Model is trained and persisted to bdh_trained.pt
  • Decision threshold optimized on validation data
  • Test predictions saved to results.csv with:
    • Prediction ID
    • Binary prediction (0 = contradicts, 1 = consistent)
    • Prediction rationale (minimum consistency score across chunks)

Note: This is an applied case study. Evaluation metrics and results are logged during training but not published here—focus is on demonstrating a complete ML pipeline from raw data to predictions.


Project Structure

.
├── README.md                      # This file
├── requirements.txt               # Python dependencies
├── rooted_rise_model.py           # Model architecture definition
├── rooted_rise_train.py           # Training script
├── rooted_rise_test.py            # Inference & evaluation script
│
├── data/
│   ├── train.csv                  # Training examples (id, content, label, book_name, char)
│   └── test.csv                   # Test examples (id, content, book_name, char)
│
├── Books/
│   ├── The Count of Monte Cristo.txt
│   └── In search of the castaways.txt
│
├── bdh_trained.pt                 # Trained model weights
├── threshold.pt                   # Learned decision threshold
└── results.csv                    # Test predictions (output)

Tech Stack

  • Framework: PyTorch (2.0+)
  • Data Processing: Pandas, NumPy
  • Language: Python 3.8+
  • Compute: CPU or CUDA-enabled GPU

Usage

Install Dependencies

pip install -r requirements.txt

Train the Model

python rooted_rise_train.py
  • Expects: data/train.csv and book texts in Books/
  • Outputs: bdh_trained.pt, threshold.pt

Run Inference

python rooted_rise_test.py
  • Expects: data/test.csv, trained model weights, book texts
  • Outputs: results.csv with predictions

Notes

  • This project demonstrates a complete applied ML pipeline: data loading → model training → threshold optimization → evaluation on held-out data.
  • The architecture is experimental and designed for this specific task (literary consistency classification).
  • All predictions are saved to results.csv for post-hoc analysis and error investigation.
  • Book state encoding is applied on a per-text basis to provide narrative context to the classifier.

Author

This is a portfolio case study demonstrating applied machine learning fundamentals: problem definition, data engineering, model training, and structured evaluation.

About

Applied Machine Learning Case Study

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages