refactor: consolidate scoring modules into acestep/core/scoring/#603
refactor: consolidate scoring modules into acestep/core/scoring/#603
Conversation
Move DiT alignment scoring and LM perplexity scoring from top-level
acestep/ into acestep/core/scoring/ with clearer naming:
- dit_alignment_score.py -> core/scoring/dit_alignment.py (MusicStampsAligner)
-> core/scoring/dit_score.py (MusicLyricScorer)
-> core/scoring/_dtw.py (shared DTW utilities)
- test_time_scaling.py -> core/scoring/lm_score.py
Split the 877-LOC dit_alignment_score.py into three focused modules
to comply with the 200 LOC module size policy.
Update all import references in:
- lyric_timestamp.py, lyric_score.py, scoring.py (UI)
- lyric_alignment_test.py (mock module paths)
- llm_inference.py (comment reference)
Delete the old top-level files (no backward-compat shims needed as
all internal callers have been updated).
Add 17 unit tests covering DTW, median filter, PMI functions, and
reward score aggregation.
📝 WalkthroughWalkthroughThe codebase reorganizes alignment and scoring functionality from a monolithic Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
acestep/core/scoring/lm_score.py (1)
1-471: 🛠️ Refactor suggestion | 🟠 MajorModule exceeds 200 LOC hard cap (471 lines).
This file is 471 LOC, which significantly exceeds the 200 LOC hard cap. While the PR correctly split
dit_alignment_score.py, this module should also be split by responsibility:
- Pure scoring utilities (
pmi_score,pmi_to_normalized_score) →_pmi_utils.py- Model context management (
_load_scoring_model_context,_get_logits_and_target_for_scoring) →_scoring_context.py- Scoring calculations (
_calculate_topk_recall,_calculate_metadata_recall,_calculate_log_prob,calculate_reward_score) → keep inlm_score.py- Public API (
calculate_pmi_score_per_condition) → keep inlm_score.py, imports from split modulesBased on learnings: "If a module would exceed 200 LOC: split by responsibility before merging, or add a short justification in PR notes with a concrete follow-up split plan."
🤖 Fix all issues with AI agents
In `@acestep/core/scoring/_dtw.py`:
- Around line 90-107: In median_filter, add an early validation that raises a
clear ValueError when filter_width is not a positive integer (e.g., if
filter_width <= 0), so unfold() cannot be called with non-positive sizes; place
this check at the start of median_filter (before computing pad_width and before
any padding/unfolding). Ensure the error message mentions median_filter and the
invalid filter_width value so callers fail fast with a clear diagnostic.
In `@acestep/core/scoring/dit_alignment.py`:
- Around line 1-7: The module DiT alignment (dit_alignment.py) is over the
200-LOC cap; split its responsibilities into three focused modules: extract
preprocessing/denoising logic into a new function/class (e.g.,
preprocess_and_denoise or Denoiser) that handles bidirectional consensus
denoising and any input normalization, move timestamp derivation/DTW logic into
a separate function (e.g., derive_timestamps or TimestampDeriver) responsible
for computing cross-attention-derived alignments and DTW consensus, and isolate
LRC formatting into a tiny formatter function (e.g., format_lrc or LRCFormatter)
that converts timestamps to LRC strings; update dit_alignment.py to orchestrate
these components (calling preprocess_and_denoise -> derive_timestamps ->
format_lrc) and add a short TODO or tracking issue comment referencing the new
modules if further decomposition is planned to satisfy the size guideline.
In `@acestep/core/scoring/dit_score.py`:
- Around line 1-7: The module acabest/core/scoring/dit_score.py is over the
200‑LOC cap; split it into three focused modules: move attention preprocessing
logic into a new module exposing a preprocess_attention(...) function, move
dynamic/path construction into a path_builder module exposing
build_alignment_path(...) (or similar), and move the metric computations into a
scoring_metrics module with compute_coverage(...), compute_monotonicity(...),
and compute_path_confidence(...). Update dit_score.py to become a thin
orchestrator that imports these functions and wires them together (retain
original public API names if any), add unit tests for each new module, and
include a short follow‑up TODO in the repo README describing the split and where
to find each piece. Ensure all imports, docstrings, and type hints are preserved
when extracting logic.
🧹 Nitpick comments (3)
acestep/core/scoring/lm_score.py (1)
443-446: Redundant variable reassignment.
prompt_uncondis already defined on line 413 and reused on line 434. The reassignment on line 445 is redundant since the value is identical.♻️ Remove redundant reassignment
log_prob_cond = _calculate_log_prob(llm_handler, formatted_prompt, target_text) - prompt_uncond = llm_handler.build_formatted_prompt_for_understanding(audio_codes="NO USER INPUT", is_negative_prompt=False) log_prob_uncond = _calculate_log_prob(llm_handler, prompt_uncond, target_text)acestep/core/scoring/scoring_test.py (1)
122-126: Prefix unused variable with underscore.The
explanationvariable is unpacked but never used. Per Python convention, prefix it with_to indicate it's intentionally unused.♻️ Proposed fix
def test_no_scores_returns_zero(self): """Empty scores dict should return zero reward.""" - total, explanation = calculate_reward_score({}) + total, _ = calculate_reward_score({}) self.assertEqual(total, 0.0)acestep/core/scoring/__init__.py (1)
13-20: Consider sorting__all__alphabetically.Static analysis suggests sorting for consistency. This is optional since the current grouping by source module is also logical.
♻️ Alphabetically sorted
__all__ = [ + "MusicLyricScorer", "MusicStampsAligner", + "SentenceTimestamp", "TokenTimestamp", - "SentenceTimestamp", - "MusicLyricScorer", "calculate_pmi_score_per_condition", "calculate_reward_score", ]
| def median_filter(x: torch.Tensor, filter_width: int) -> torch.Tensor: | ||
| """ | ||
| Apply median filter to tensor. | ||
|
|
||
| Args: | ||
| x: Input tensor | ||
| filter_width: Width of median filter | ||
|
|
||
| Returns: | ||
| Filtered tensor | ||
| """ | ||
| pad_width = filter_width // 2 | ||
| if x.shape[-1] <= pad_width: | ||
| return x | ||
| if x.ndim == 2: | ||
| x = x[None, :] | ||
| x = F.pad(x, (filter_width // 2, filter_width // 2, 0, 0), mode="reflect") | ||
| result = x.unfold(-1, filter_width, 1).sort()[0][..., filter_width // 2] |
There was a problem hiding this comment.
Guard against non-positive filter_width values.
unfold will throw when filter_width <= 0; fail fast with a clear error to avoid hidden runtime crashes.
🔧 Proposed fix
def median_filter(x: torch.Tensor, filter_width: int) -> torch.Tensor:
"""
Apply median filter to tensor.
@@
Returns:
Filtered tensor
"""
+ if filter_width <= 0:
+ raise ValueError("filter_width must be a positive integer")
pad_width = filter_width // 2
if x.shape[-1] <= pad_width:
return x🤖 Prompt for AI Agents
In `@acestep/core/scoring/_dtw.py` around lines 90 - 107, In median_filter, add an
early validation that raises a clear ValueError when filter_width is not a
positive integer (e.g., if filter_width <= 0), so unfold() cannot be called with
non-positive sizes; place this check at the start of median_filter (before
computing pad_width and before any padding/unfolding). Ensure the error message
mentions median_filter and the invalid filter_width value so callers fail fast
with a clear diagnostic.
| """ | ||
| DiT Alignment Module | ||
|
|
||
| Provides lyrics-to-audio timestamp alignment using cross-attention matrices | ||
| from the DiT model. Produces LRC-format timestamps via bidirectional | ||
| consensus denoising and DTW. | ||
| """ |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Module exceeds the 200‑LOC cap; please split further or add a concrete follow‑up plan.
Consider separating preprocessing/denoising, timestamp derivation, and LRC formatting into smaller modules.
As per coding guidelines: Target module size: optimal <= 150 LOC, hard cap 200 LOC.
🤖 Prompt for AI Agents
In `@acestep/core/scoring/dit_alignment.py` around lines 1 - 7, The module DiT
alignment (dit_alignment.py) is over the 200-LOC cap; split its responsibilities
into three focused modules: extract preprocessing/denoising logic into a new
function/class (e.g., preprocess_and_denoise or Denoiser) that handles
bidirectional consensus denoising and any input normalization, move timestamp
derivation/DTW logic into a separate function (e.g., derive_timestamps or
TimestampDeriver) responsible for computing cross-attention-derived alignments
and DTW consensus, and isolate LRC formatting into a tiny formatter function
(e.g., format_lrc or LRCFormatter) that converts timestamps to LRC strings;
update dit_alignment.py to orchestrate these components (calling
preprocess_and_denoise -> derive_timestamps -> format_lrc) and add a short TODO
or tracking issue comment referencing the new modules if further decomposition
is planned to satisfy the size guideline.
| """ | ||
| DiT Lyrics Quality Scorer | ||
|
|
||
| Evaluates lyrics-to-audio alignment quality using cross-attention energy | ||
| matrices. Computes Coverage, Monotonicity, and Path Confidence metrics | ||
| via tensor operations. | ||
| """ |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Module exceeds the 200‑LOC cap; please split further or add a concrete follow‑up plan.
A split between attention preprocessing, path construction, and scoring metrics would keep each module under the cap.
As per coding guidelines: Target module size: optimal <= 150 LOC, hard cap 200 LOC.
🤖 Prompt for AI Agents
In `@acestep/core/scoring/dit_score.py` around lines 1 - 7, The module
acabest/core/scoring/dit_score.py is over the 200‑LOC cap; split it into three
focused modules: move attention preprocessing logic into a new module exposing a
preprocess_attention(...) function, move dynamic/path construction into a
path_builder module exposing build_alignment_path(...) (or similar), and move
the metric computations into a scoring_metrics module with
compute_coverage(...), compute_monotonicity(...), and
compute_path_confidence(...). Update dit_score.py to become a thin orchestrator
that imports these functions and wires them together (retain original public API
names if any), add unit tests for each new module, and include a short follow‑up
TODO in the repo README describing the split and where to find each piece.
Ensure all imports, docstrings, and type hints are preserved when extracting
logic.
Move DiT alignment scoring and LM perplexity scoring from top-level acestep/ into acestep/core/scoring/ with clearer naming:
Split the 877-LOC dit_alignment_score.py into three focused modules to comply with the 200 LOC module size policy.
Update all import references in:
Delete the old top-level files (no backward-compat shims needed as all internal callers have been updated).
Add 17 unit tests covering DTW, median filter, PMI functions, and reward score aggregation.
Summary by CodeRabbit
Chores
core/scoringpackage for improved code organization.Tests
Documentation