RootedRise-ML — Applied Machine Learning Case Study

Problem

Given a work of literary fiction, can we build a model to identify whether statements about specific characters are consistent with the narrative?

This is a text classification task: for each statement and character pair, predict whether the statement contradicts established facts in the story, or aligns with them.

Domain: Natural Language Understanding, Literary Reasoning
Task: Binary Classification (Consistent / Contradicts)

Approach

Data Pipeline

Train/Test Split: CSV-based labeled statements with source documents (novels)
Preprocessing: Character-level tokenization; sequential chunk processing with overlap
Book State Encoding: Build long-form narrative representations by processing entire texts in chunks

Model Architecture

A custom transformer-like architecture with:

Multi-head attention-like mechanisms over token sequences
State-mediated gating to track evolving narrative beliefs
Sparse latent representations to capture textual patterns
Consistency scoring based on state divergence

Training Pipeline

Initialize model with standard PyTorch configuration
Load training data from CSV
Build book-level narrative states by processing source texts
Train on labeled examples using cross-entropy loss
Evaluate on held-out test set with threshold-based predictions
Save results to results.csv

Hyperparameters: 4 layers, 128 embedding dim, 4 attention heads, AdamW optimizer (lr=3e-4)

Evaluation

Model is trained and persisted to bdh_trained.pt
Decision threshold optimized on validation data
Test predictions saved to results.csv with:
- Prediction ID
- Binary prediction (0 = contradicts, 1 = consistent)
- Prediction rationale (minimum consistency score across chunks)

Note: This is an applied case study. Evaluation metrics and results are logged during training but not published here—focus is on demonstrating a complete ML pipeline from raw data to predictions.

Project Structure

.
├── README.md                      # This file
├── requirements.txt               # Python dependencies
├── rooted_rise_model.py           # Model architecture definition
├── rooted_rise_train.py           # Training script
├── rooted_rise_test.py            # Inference & evaluation script
│
├── data/
│   ├── train.csv                  # Training examples (id, content, label, book_name, char)
│   └── test.csv                   # Test examples (id, content, book_name, char)
│
├── Books/
│   ├── The Count of Monte Cristo.txt
│   └── In search of the castaways.txt
│
├── bdh_trained.pt                 # Trained model weights
├── threshold.pt                   # Learned decision threshold
└── results.csv                    # Test predictions (output)

Tech Stack

Framework: PyTorch (2.0+)
Data Processing: Pandas, NumPy
Language: Python 3.8+
Compute: CPU or CUDA-enabled GPU

Usage

Install Dependencies

pip install -r requirements.txt

Train the Model

python rooted_rise_train.py

Expects: data/train.csv and book texts in Books/
Outputs: bdh_trained.pt, threshold.pt

Run Inference

python rooted_rise_test.py

Expects: data/test.csv, trained model weights, book texts
Outputs: results.csv with predictions

Notes

This project demonstrates a complete applied ML pipeline: data loading → model training → threshold optimization → evaluation on held-out data.
The architecture is experimental and designed for this specific task (literary consistency classification).
All predictions are saved to results.csv for post-hoc analysis and error investigation.
Book state encoding is applied on a per-text basis to provide narrative context to the classifier.

Author

This is a portfolio case study demonstrating applied machine learning fundamentals: problem definition, data engineering, model training, and structured evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RootedRise-ML — Applied Machine Learning Case Study

Problem

Approach

Data Pipeline

Model Architecture

Training Pipeline

Evaluation

Project Structure

Tech Stack

Usage

Install Dependencies

Train the Model

Run Inference

Notes

Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
requirements.txt		requirements.txt
rooted_rise_model.py		rooted_rise_model.py
rooted_rise_test.py		rooted_rise_test.py
rooted_rise_train.py		rooted_rise_train.py

Kashvi05agarwal/Applied-Machine-Learning-Case-Study

Folders and files

Latest commit

History

Repository files navigation

RootedRise-ML — Applied Machine Learning Case Study

Problem

Approach

Data Pipeline

Model Architecture

Training Pipeline

Evaluation

Project Structure

Tech Stack

Usage

Install Dependencies

Train the Model

Run Inference

Notes

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages