Skip to content

Use character positions for text slicing instead of reconstructing from sentences#3

Draft
Copilot wants to merge 4 commits intoabir-explorationsfrom
copilot/sub-pr-2
Draft

Use character positions for text slicing instead of reconstructing from sentences#3
Copilot wants to merge 4 commits intoabir-explorationsfrom
copilot/sub-pr-2

Conversation

Copy link

Copilot AI commented Dec 10, 2025

Feature extraction reconstructed text by joining split sentences (problem + " " + " ".join(sentences[:idx])), losing formatting and whitespace from the original CoT. This affects hidden state extraction accuracy.

Changes

  • split_into_sentences(): Now returns (sentences, positions) tuple where positions track character boundaries in original text
  • Data storage: Added sentence_positions field to preserve original text structure
  • Feature extraction: Changed from sentence reconstruction to character slicing: cot[:end_pos]

Example

# Before: loses double spaces between sentences
cot = "First step.  Second step.  Third step."
sentences = split_into_sentences(cot)  # ['First step.', 'Second step.', ...]
text_before = " ".join(sentences[:2])  # "First step. Second step." (single space)

# After: preserves original formatting
sentences, positions = split_into_sentences(cot)
text_before = cot[:positions[1][1]]  # "First step.  Second step." (double space)

Position tracking uses unstripped boundaries for accurate slicing while sentences remain stripped for classification.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits December 10, 2025 09:08
…ng sentences

Co-authored-by: abirharrasse <81148161+abirharrasse@users.noreply.github.com>
Co-authored-by: abirharrasse <81148161+abirharrasse@users.noreply.github.com>
Co-authored-by: abirharrasse <81148161+abirharrasse@users.noreply.github.com>
Copilot AI changed the title [WIP] Address feedback on initial experiments PR Use character positions for text slicing instead of reconstructing from sentences Dec 10, 2025
Copilot AI requested a review from abirharrasse December 10, 2025 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants