refactor: consolidate scoring modules into acestep/core/scoring/ by ChuxiJ · Pull Request #603 · ace-step/ACE-Step-1.5

ChuxiJ · 2026-02-16T04:34:18Z

Move DiT alignment scoring and LM perplexity scoring from top-level acestep/ into acestep/core/scoring/ with clearer naming:

dit_alignment_score.py -> core/scoring/dit_alignment.py (MusicStampsAligner) -> core/scoring/dit_score.py (MusicLyricScorer) -> core/scoring/_dtw.py (shared DTW utilities)
test_time_scaling.py -> core/scoring/lm_score.py

Split the 877-LOC dit_alignment_score.py into three focused modules to comply with the 200 LOC module size policy.

Update all import references in:

lyric_timestamp.py, lyric_score.py, scoring.py (UI)
lyric_alignment_test.py (mock module paths)
llm_inference.py (comment reference)

Delete the old top-level files (no backward-compat shims needed as all internal callers have been updated).

Add 17 unit tests covering DTW, median filter, PMI functions, and reward score aggregation.

Summary by CodeRabbit

Chores
- Reorganized scoring and alignment functionality into a dedicated core/scoring package for improved code organization.
- Updated internal import paths throughout the codebase to reflect the new module structure.
Tests
- Added comprehensive unit tests for DTW implementations and scoring utility functions.
Documentation
- Updated internal code references to reflect module reorganization.

Move DiT alignment scoring and LM perplexity scoring from top-level acestep/ into acestep/core/scoring/ with clearer naming: - dit_alignment_score.py -> core/scoring/dit_alignment.py (MusicStampsAligner) -> core/scoring/dit_score.py (MusicLyricScorer) -> core/scoring/_dtw.py (shared DTW utilities) - test_time_scaling.py -> core/scoring/lm_score.py Split the 877-LOC dit_alignment_score.py into three focused modules to comply with the 200 LOC module size policy. Update all import references in: - lyric_timestamp.py, lyric_score.py, scoring.py (UI) - lyric_alignment_test.py (mock module paths) - llm_inference.py (comment reference) Delete the old top-level files (no backward-compat shims needed as all internal callers have been updated). Add 17 unit tests covering DTW, median filter, PMI functions, and reward score aggregation.

coderabbitai · 2026-02-16T04:34:37Z

📝 Walkthrough

Walkthrough

The codebase reorganizes alignment and scoring functionality from a monolithic acestep/dit_alignment_score.py module into a modular structure under acestep/core/scoring/. The old module is replaced with separate files for DTW utilities, alignment, and scoring. Import paths are updated across the codebase, and a public API is established through package initialization.

Changes

Cohort / File(s)	Summary
Module Restructuring `acestep/dit_alignment_score.py`, `acestep/core/scoring/_dtw.py`, `acestep/core/scoring/dit_alignment.py`, `acestep/core/scoring/dit_score.py`	Removes monolithic alignment/scoring module and replaces with modular structure: DTW utilities, music timestamp alignment with token/sentence grouping and LRC formatting, and music lyric scoring with coverage/monotonicity/confidence metrics.
Public API Exposure `acestep/core/scoring/__init__.py`	Introduces `__all__` export list and re-exports six public entities: `MusicStampsAligner`, `TokenTimestamp`, `SentenceTimestamp`, `MusicLyricScorer`, `calculate_pmi_score_per_condition`, `calculate_reward_score`.
Updated Import Paths `acestep/core/generation/handler/lyric_alignment_test.py`, `acestep/core/generation/handler/lyric_score.py`, `acestep/core/generation/handler/lyric_timestamp.py`	Corrects mock and import targets from old module paths to new `acestep.core.scoring.*` submodule locations.
Documentation & Reference Updates `acestep/core/scoring/lm_score.py`, `acestep/llm_inference.py`	Minor formatting and documentation adjustments; docstring fixes, dash character consistency, and reference path updates from test modules to new scoring module.
Test Coverage `acestep/core/scoring/scoring_test.py`	Adds comprehensive unit tests for DTW, median filter, PMI scoring functions, and reward score aggregation with edge cases and typical usage scenarios.
UI Module Updates `acestep/ui/gradio/events/results/scoring.py`	Updates import source for `calculate_pmi_score_per_condition` from old test module to new `acestep.core.scoring.lm_score`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through refactored code so neat,
DTW paths and scoring beat,
Modular burrows now align,
From scattered hutches, a structure divine!
Public APIs bloom, imports redesigned,
This rabbit's warren: reorganized and refined! 🌟

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and concisely describes the main change: consolidating scoring modules into a core/scoring directory with reorganization and refactoring.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor/scoring-module-consolidation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

acestep/core/scoring/lm_score.py (1)

1-471: 🛠️ Refactor suggestion | 🟠 Major

Module exceeds 200 LOC hard cap (471 lines).

This file is 471 LOC, which significantly exceeds the 200 LOC hard cap. While the PR correctly split dit_alignment_score.py, this module should also be split by responsibility:

Pure scoring utilities (pmi_score, pmi_to_normalized_score) → _pmi_utils.py

Model context management (_load_scoring_model_context, _get_logits_and_target_for_scoring) → _scoring_context.py

Scoring calculations (_calculate_topk_recall, _calculate_metadata_recall, _calculate_log_prob, calculate_reward_score) → keep in lm_score.py

Public API (calculate_pmi_score_per_condition) → keep in lm_score.py, imports from split modules

Based on learnings: "If a module would exceed 200 LOC: split by responsibility before merging, or add a short justification in PR notes with a concrete follow-up split plan."

🤖 Fix all issues with AI agents

In `@acestep/core/scoring/_dtw.py`:
- Around line 90-107: In median_filter, add an early validation that raises a
clear ValueError when filter_width is not a positive integer (e.g., if
filter_width <= 0), so unfold() cannot be called with non-positive sizes; place
this check at the start of median_filter (before computing pad_width and before
any padding/unfolding). Ensure the error message mentions median_filter and the
invalid filter_width value so callers fail fast with a clear diagnostic.

In `@acestep/core/scoring/dit_alignment.py`:
- Around line 1-7: The module DiT alignment (dit_alignment.py) is over the
200-LOC cap; split its responsibilities into three focused modules: extract
preprocessing/denoising logic into a new function/class (e.g.,
preprocess_and_denoise or Denoiser) that handles bidirectional consensus
denoising and any input normalization, move timestamp derivation/DTW logic into
a separate function (e.g., derive_timestamps or TimestampDeriver) responsible
for computing cross-attention-derived alignments and DTW consensus, and isolate
LRC formatting into a tiny formatter function (e.g., format_lrc or LRCFormatter)
that converts timestamps to LRC strings; update dit_alignment.py to orchestrate
these components (calling preprocess_and_denoise -> derive_timestamps ->
format_lrc) and add a short TODO or tracking issue comment referencing the new
modules if further decomposition is planned to satisfy the size guideline.

In `@acestep/core/scoring/dit_score.py`:
- Around line 1-7: The module acabest/core/scoring/dit_score.py is over the
200‑LOC cap; split it into three focused modules: move attention preprocessing
logic into a new module exposing a preprocess_attention(...) function, move
dynamic/path construction into a path_builder module exposing
build_alignment_path(...) (or similar), and move the metric computations into a
scoring_metrics module with compute_coverage(...), compute_monotonicity(...),
and compute_path_confidence(...). Update dit_score.py to become a thin
orchestrator that imports these functions and wires them together (retain
original public API names if any), add unit tests for each new module, and
include a short follow‑up TODO in the repo README describing the split and where
to find each piece. Ensure all imports, docstrings, and type hints are preserved
when extracting logic.

🧹 Nitpick comments (3)

acestep/core/scoring/lm_score.py (1)
443-446: Redundant variable reassignment.

prompt_uncond is already defined on line 413 and reused on line 434. The reassignment on line 445 is redundant since the value is identical.
♻️ Remove redundant reassignment
         log_prob_cond = _calculate_log_prob(llm_handler, formatted_prompt, target_text)

-        prompt_uncond = llm_handler.build_formatted_prompt_for_understanding(audio_codes="NO USER INPUT", is_negative_prompt=False)
         log_prob_uncond = _calculate_log_prob(llm_handler, prompt_uncond, target_text)
acestep/core/scoring/scoring_test.py (1)
122-126: Prefix unused variable with underscore.

The explanation variable is unpacked but never used. Per Python convention, prefix it with _ to indicate it's intentionally unused.
♻️ Proposed fix
     def test_no_scores_returns_zero(self):
         """Empty scores dict should return zero reward."""
-        total, explanation = calculate_reward_score({})
+        total, _ = calculate_reward_score({})
         self.assertEqual(total, 0.0)
acestep/core/scoring/__init__.py (1)
13-20: Consider sorting __all__ alphabetically.

Static analysis suggests sorting for consistency. This is optional since the current grouping by source module is also logical.
♻️ Alphabetically sorted
 __all__ = [
+    "MusicLyricScorer",
     "MusicStampsAligner",
+    "SentenceTimestamp",
     "TokenTimestamp",
-    "SentenceTimestamp",
-    "MusicLyricScorer",
     "calculate_pmi_score_per_condition",
     "calculate_reward_score",
 ]

coderabbitai · 2026-02-16T04:40:39Z

acestep/core/scoring/_dtw.py

+def median_filter(x: torch.Tensor, filter_width: int) -> torch.Tensor:
+    """
+    Apply median filter to tensor.
+
+    Args:
+        x: Input tensor
+        filter_width: Width of median filter
+
+    Returns:
+        Filtered tensor
+    """
+    pad_width = filter_width // 2
+    if x.shape[-1] <= pad_width:
+        return x
+    if x.ndim == 2:
+        x = x[None, :]
+    x = F.pad(x, (filter_width // 2, filter_width // 2, 0, 0), mode="reflect")
+    result = x.unfold(-1, filter_width, 1).sort()[0][..., filter_width // 2]


⚠️ Potential issue | 🟡 Minor

Guard against non-positive filter_width values.

unfold will throw when filter_width <= 0; fail fast with a clear error to avoid hidden runtime crashes.

🔧 Proposed fix

def median_filter(x: torch.Tensor, filter_width: int) -> torch.Tensor: """ Apply median filter to tensor. @@ Returns: Filtered tensor """ + if filter_width <= 0: + raise ValueError("filter_width must be a positive integer") pad_width = filter_width // 2 if x.shape[-1] <= pad_width: return x

🤖 Prompt for AI Agents

In `@acestep/core/scoring/_dtw.py` around lines 90 - 107, In median_filter, add an early validation that raises a clear ValueError when filter_width is not a positive integer (e.g., if filter_width <= 0), so unfold() cannot be called with non-positive sizes; place this check at the start of median_filter (before computing pad_width and before any padding/unfolding). Ensure the error message mentions median_filter and the invalid filter_width value so callers fail fast with a clear diagnostic.

coderabbitai · 2026-02-16T04:40:40Z

acestep/core/scoring/dit_alignment.py

+"""
+DiT Alignment Module
+
+Provides lyrics-to-audio timestamp alignment using cross-attention matrices
+from the DiT model.  Produces LRC-format timestamps via bidirectional
+consensus denoising and DTW.
+"""


🛠️ Refactor suggestion | 🟠 Major

Module exceeds the 200‑LOC cap; please split further or add a concrete follow‑up plan.

Consider separating preprocessing/denoising, timestamp derivation, and LRC formatting into smaller modules.

As per coding guidelines: Target module size: optimal <= 150 LOC, hard cap 200 LOC.

🤖 Prompt for AI Agents

In `@acestep/core/scoring/dit_alignment.py` around lines 1 - 7, The module DiT alignment (dit_alignment.py) is over the 200-LOC cap; split its responsibilities into three focused modules: extract preprocessing/denoising logic into a new function/class (e.g., preprocess_and_denoise or Denoiser) that handles bidirectional consensus denoising and any input normalization, move timestamp derivation/DTW logic into a separate function (e.g., derive_timestamps or TimestampDeriver) responsible for computing cross-attention-derived alignments and DTW consensus, and isolate LRC formatting into a tiny formatter function (e.g., format_lrc or LRCFormatter) that converts timestamps to LRC strings; update dit_alignment.py to orchestrate these components (calling preprocess_and_denoise -> derive_timestamps -> format_lrc) and add a short TODO or tracking issue comment referencing the new modules if further decomposition is planned to satisfy the size guideline.

coderabbitai · 2026-02-16T04:40:40Z

acestep/core/scoring/dit_score.py

+"""
+DiT Lyrics Quality Scorer
+
+Evaluates lyrics-to-audio alignment quality using cross-attention energy
+matrices.  Computes Coverage, Monotonicity, and Path Confidence metrics
+via tensor operations.
+"""


🛠️ Refactor suggestion | 🟠 Major

Module exceeds the 200‑LOC cap; please split further or add a concrete follow‑up plan.

A split between attention preprocessing, path construction, and scoring metrics would keep each module under the cap.

As per coding guidelines: Target module size: optimal <= 150 LOC, hard cap 200 LOC.

🤖 Prompt for AI Agents

In `@acestep/core/scoring/dit_score.py` around lines 1 - 7, The module acabest/core/scoring/dit_score.py is over the 200‑LOC cap; split it into three focused modules: move attention preprocessing logic into a new module exposing a preprocess_attention(...) function, move dynamic/path construction into a path_builder module exposing build_alignment_path(...) (or similar), and move the metric computations into a scoring_metrics module with compute_coverage(...), compute_monotonicity(...), and compute_path_confidence(...). Update dit_score.py to become a thin orchestrator that imports these functions and wires them together (retain original public API names if any), add unit tests for each new module, and include a short follow‑up TODO in the repo README describing the split and where to find each piece. Ensure all imports, docstrings, and type hints are preserved when extracting logic.

coderabbitai bot reviewed Feb 16, 2026

View reviewed changes

ChuxiJ merged commit f95ca3a into main Feb 16, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: consolidate scoring modules into acestep/core/scoring/#603

refactor: consolidate scoring modules into acestep/core/scoring/#603
ChuxiJ merged 1 commit intomainfrom
refactor/scoring-module-consolidation

ChuxiJ commented Feb 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 16, 2026

Uh oh!

coderabbitai bot Feb 16, 2026

Uh oh!

coderabbitai bot Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChuxiJ commented Feb 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChuxiJ commented Feb 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 16, 2026 •

edited

Loading