Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental pipeline for analyzing causal relationships in chain-of-thought (CoT) reasoning. The code generates CoT reasoning traces from a language model, computes sentence-level causal influence matrices using attention masking, extracts high-importance "thought anchors," and trains classifiers to categorize reasoning steps into 8 semantic classes (e.g., problem setup, active computation, self-checking).
- Implements causal tracing via attention masking to measure how masking source sentences affects target sentence predictions (KL divergence)
- Classifies reasoning sentences into 8 anchor classes and selects high-importance sentences based on causal outgoing influence
- Trains Logistic Regression and MLP classifiers on hidden states to predict anchor classes
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| text_before = problem + " " + " ".join(sentences[:idx]) | ||
| hidden_state = get_hidden_state(text_before) |
There was a problem hiding this comment.
The feature extraction concatenates problem + " " + " ".join(sentences[:idx]) to get context before each anchor sentence. This reconstructs text from split sentences, which may not match the original CoT text due to lost formatting, punctuation, or whitespace. This inconsistency could affect the hidden state extraction. Consider storing the original character positions of sentences and slicing the original CoT text instead of reconstructing from split sentences.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
|
|
||
| scaler = StandardScaler() | ||
| X_scaled = scaler.fit_transform(X) | ||
| X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, stratify=y) |
There was a problem hiding this comment.
The train/test split uses stratify=y which is good practice, but with small class counts (min_samples=2), some classes may have only 2-3 samples total. Stratification with very small classes can fail or result in insufficient test samples for meaningful evaluation. Consider increasing min_samples to at least 5-10, or using cross-validation instead of a single train/test split for more robust evaluation with limited data.
| # EXTRACT ANCHORS | ||
|
|
||
| print("="*80) | ||
| print("PHASE 2: EXTRACTING THOUGHT ANCHORS") |
There was a problem hiding this comment.
This is labeled as "PHASE 2" but Phase 2 was already used at line 207. The phases are misnumbered. This should be Phase 3, and subsequent phases should be renumbered (current Phase 3 → Phase 4, current Phase 4 → Phase 5).
| print("PHASE 2: EXTRACTING THOUGHT ANCHORS") | |
| print("PHASE 3: EXTRACTING THOUGHT ANCHORS") |
| 'outgoing': outgoing_feature | ||
| }) | ||
|
|
||
| pickle.dump(all_features, open(ckpt_features, 'wb')) |
There was a problem hiding this comment.
File is opened but is not closed.
| pickle.dump(all_features, open(ckpt_features, 'wb')) | |
| with open(ckpt_features, 'wb') as f: | |
| pickle.dump(all_features, f) |
exploration/rpc.py
Outdated
| pickle.dump(clf_lr, open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb')) | ||
| pickle.dump(clf_mlp, open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb')) | ||
| pickle.dump(scaler, open(f"{checkpoint_dir}/scaler.pkl", 'wb')) | ||
| pickle.dump(class_to_idx, open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb')) |
There was a problem hiding this comment.
File is opened but is not closed.
| pickle.dump(clf_lr, open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb')) | |
| pickle.dump(clf_mlp, open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb')) | |
| pickle.dump(scaler, open(f"{checkpoint_dir}/scaler.pkl", 'wb')) | |
| pickle.dump(class_to_idx, open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb')) | |
| with open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb') as f: | |
| pickle.dump(clf_lr, f) | |
| with open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb') as f: | |
| pickle.dump(clf_mlp, f) | |
| with open(f"{checkpoint_dir}/scaler.pkl", 'wb') as f: | |
| pickle.dump(scaler, f) | |
| with open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb') as f: | |
| pickle.dump(class_to_idx, f) |
exploration/rpc.py
Outdated
| pickle.dump(clf_lr, open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb')) | ||
| pickle.dump(clf_mlp, open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb')) | ||
| pickle.dump(scaler, open(f"{checkpoint_dir}/scaler.pkl", 'wb')) | ||
| pickle.dump(class_to_idx, open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb')) |
There was a problem hiding this comment.
File is opened but is not closed.
| pickle.dump(clf_lr, open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb')) | |
| pickle.dump(clf_mlp, open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb')) | |
| pickle.dump(scaler, open(f"{checkpoint_dir}/scaler.pkl", 'wb')) | |
| pickle.dump(class_to_idx, open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb')) | |
| with open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb') as f: | |
| pickle.dump(clf_lr, f) | |
| with open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb') as f: | |
| pickle.dump(clf_mlp, f) | |
| with open(f"{checkpoint_dir}/scaler.pkl", 'wb') as f: | |
| pickle.dump(scaler, f) | |
| with open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb') as f: | |
| pickle.dump(class_to_idx, f) |
exploration/rpc.py
Outdated
| pickle.dump(clf_lr, open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb')) | ||
| pickle.dump(clf_mlp, open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb')) | ||
| pickle.dump(scaler, open(f"{checkpoint_dir}/scaler.pkl", 'wb')) | ||
| pickle.dump(class_to_idx, open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb')) |
There was a problem hiding this comment.
File is opened but is not closed.
| pickle.dump(clf_lr, open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb')) | |
| pickle.dump(clf_mlp, open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb')) | |
| pickle.dump(scaler, open(f"{checkpoint_dir}/scaler.pkl", 'wb')) | |
| pickle.dump(class_to_idx, open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb')) | |
| with open(f"{checkpoint_dir}/classifier_lr.pkl", 'wb') as f: | |
| pickle.dump(clf_lr, f) | |
| with open(f"{checkpoint_dir}/classifier_mlp.pkl", 'wb') as f: | |
| pickle.dump(clf_mlp, f) | |
| with open(f"{checkpoint_dir}/scaler.pkl", 'wb') as f: | |
| pickle.dump(scaler, f) | |
| with open(f"{checkpoint_dir}/class_to_idx.pkl", 'wb') as f: | |
| pickle.dump(class_to_idx, f) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@abirharrasse I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
No description provided.