Skip to content

refactor: consolidate scoring modules into acestep/core/scoring/#603

Merged
ChuxiJ merged 1 commit intomainfrom
refactor/scoring-module-consolidation
Feb 16, 2026
Merged

refactor: consolidate scoring modules into acestep/core/scoring/#603
ChuxiJ merged 1 commit intomainfrom
refactor/scoring-module-consolidation

Conversation

@ChuxiJ
Copy link
Contributor

@ChuxiJ ChuxiJ commented Feb 16, 2026

Move DiT alignment scoring and LM perplexity scoring from top-level acestep/ into acestep/core/scoring/ with clearer naming:

  • dit_alignment_score.py -> core/scoring/dit_alignment.py (MusicStampsAligner) -> core/scoring/dit_score.py (MusicLyricScorer) -> core/scoring/_dtw.py (shared DTW utilities)
  • test_time_scaling.py -> core/scoring/lm_score.py

Split the 877-LOC dit_alignment_score.py into three focused modules to comply with the 200 LOC module size policy.

Update all import references in:

  • lyric_timestamp.py, lyric_score.py, scoring.py (UI)
  • lyric_alignment_test.py (mock module paths)
  • llm_inference.py (comment reference)

Delete the old top-level files (no backward-compat shims needed as all internal callers have been updated).

Add 17 unit tests covering DTW, median filter, PMI functions, and reward score aggregation.

Summary by CodeRabbit

  • Chores

    • Reorganized scoring and alignment functionality into a dedicated core/scoring package for improved code organization.
    • Updated internal import paths throughout the codebase to reflect the new module structure.
  • Tests

    • Added comprehensive unit tests for DTW implementations and scoring utility functions.
  • Documentation

    • Updated internal code references to reflect module reorganization.

Move DiT alignment scoring and LM perplexity scoring from top-level
acestep/ into acestep/core/scoring/ with clearer naming:

- dit_alignment_score.py -> core/scoring/dit_alignment.py (MusicStampsAligner)
                         -> core/scoring/dit_score.py (MusicLyricScorer)
                         -> core/scoring/_dtw.py (shared DTW utilities)
- test_time_scaling.py   -> core/scoring/lm_score.py

Split the 877-LOC dit_alignment_score.py into three focused modules
to comply with the 200 LOC module size policy.

Update all import references in:
- lyric_timestamp.py, lyric_score.py, scoring.py (UI)
- lyric_alignment_test.py (mock module paths)
- llm_inference.py (comment reference)

Delete the old top-level files (no backward-compat shims needed as
all internal callers have been updated).

Add 17 unit tests covering DTW, median filter, PMI functions, and
reward score aggregation.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

📝 Walkthrough

Walkthrough

The codebase reorganizes alignment and scoring functionality from a monolithic acestep/dit_alignment_score.py module into a modular structure under acestep/core/scoring/. The old module is replaced with separate files for DTW utilities, alignment, and scoring. Import paths are updated across the codebase, and a public API is established through package initialization.

Changes

Cohort / File(s) Summary
Module Restructuring
acestep/dit_alignment_score.py, acestep/core/scoring/_dtw.py, acestep/core/scoring/dit_alignment.py, acestep/core/scoring/dit_score.py
Removes monolithic alignment/scoring module and replaces with modular structure: DTW utilities, music timestamp alignment with token/sentence grouping and LRC formatting, and music lyric scoring with coverage/monotonicity/confidence metrics.
Public API Exposure
acestep/core/scoring/__init__.py
Introduces __all__ export list and re-exports six public entities: MusicStampsAligner, TokenTimestamp, SentenceTimestamp, MusicLyricScorer, calculate_pmi_score_per_condition, calculate_reward_score.
Updated Import Paths
acestep/core/generation/handler/lyric_alignment_test.py, acestep/core/generation/handler/lyric_score.py, acestep/core/generation/handler/lyric_timestamp.py
Corrects mock and import targets from old module paths to new acestep.core.scoring.* submodule locations.
Documentation & Reference Updates
acestep/core/scoring/lm_score.py, acestep/llm_inference.py
Minor formatting and documentation adjustments; docstring fixes, dash character consistency, and reference path updates from test modules to new scoring module.
Test Coverage
acestep/core/scoring/scoring_test.py
Adds comprehensive unit tests for DTW, median filter, PMI scoring functions, and reward score aggregation with edge cases and typical usage scenarios.
UI Module Updates
acestep/ui/gradio/events/results/scoring.py
Updates import source for calculate_pmi_score_per_condition from old test module to new acestep.core.scoring.lm_score.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hops through refactored code so neat,
DTW paths and scoring beat,
Modular burrows now align,
From scattered hutches, a structure divine!
Public APIs bloom, imports redesigned,
This rabbit's warren: reorganized and refined! 🌟

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: consolidating scoring modules into a core/scoring directory with reorganization and refactoring.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/scoring-module-consolidation

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
acestep/core/scoring/lm_score.py (1)

1-471: 🛠️ Refactor suggestion | 🟠 Major

Module exceeds 200 LOC hard cap (471 lines).

This file is 471 LOC, which significantly exceeds the 200 LOC hard cap. While the PR correctly split dit_alignment_score.py, this module should also be split by responsibility:

  • Pure scoring utilities (pmi_score, pmi_to_normalized_score) → _pmi_utils.py
  • Model context management (_load_scoring_model_context, _get_logits_and_target_for_scoring) → _scoring_context.py
  • Scoring calculations (_calculate_topk_recall, _calculate_metadata_recall, _calculate_log_prob, calculate_reward_score) → keep in lm_score.py
  • Public API (calculate_pmi_score_per_condition) → keep in lm_score.py, imports from split modules

Based on learnings: "If a module would exceed 200 LOC: split by responsibility before merging, or add a short justification in PR notes with a concrete follow-up split plan."

🤖 Fix all issues with AI agents
In `@acestep/core/scoring/_dtw.py`:
- Around line 90-107: In median_filter, add an early validation that raises a
clear ValueError when filter_width is not a positive integer (e.g., if
filter_width <= 0), so unfold() cannot be called with non-positive sizes; place
this check at the start of median_filter (before computing pad_width and before
any padding/unfolding). Ensure the error message mentions median_filter and the
invalid filter_width value so callers fail fast with a clear diagnostic.

In `@acestep/core/scoring/dit_alignment.py`:
- Around line 1-7: The module DiT alignment (dit_alignment.py) is over the
200-LOC cap; split its responsibilities into three focused modules: extract
preprocessing/denoising logic into a new function/class (e.g.,
preprocess_and_denoise or Denoiser) that handles bidirectional consensus
denoising and any input normalization, move timestamp derivation/DTW logic into
a separate function (e.g., derive_timestamps or TimestampDeriver) responsible
for computing cross-attention-derived alignments and DTW consensus, and isolate
LRC formatting into a tiny formatter function (e.g., format_lrc or LRCFormatter)
that converts timestamps to LRC strings; update dit_alignment.py to orchestrate
these components (calling preprocess_and_denoise -> derive_timestamps ->
format_lrc) and add a short TODO or tracking issue comment referencing the new
modules if further decomposition is planned to satisfy the size guideline.

In `@acestep/core/scoring/dit_score.py`:
- Around line 1-7: The module acabest/core/scoring/dit_score.py is over the
200‑LOC cap; split it into three focused modules: move attention preprocessing
logic into a new module exposing a preprocess_attention(...) function, move
dynamic/path construction into a path_builder module exposing
build_alignment_path(...) (or similar), and move the metric computations into a
scoring_metrics module with compute_coverage(...), compute_monotonicity(...),
and compute_path_confidence(...). Update dit_score.py to become a thin
orchestrator that imports these functions and wires them together (retain
original public API names if any), add unit tests for each new module, and
include a short follow‑up TODO in the repo README describing the split and where
to find each piece. Ensure all imports, docstrings, and type hints are preserved
when extracting logic.
🧹 Nitpick comments (3)
acestep/core/scoring/lm_score.py (1)

443-446: Redundant variable reassignment.

prompt_uncond is already defined on line 413 and reused on line 434. The reassignment on line 445 is redundant since the value is identical.

♻️ Remove redundant reassignment
         log_prob_cond = _calculate_log_prob(llm_handler, formatted_prompt, target_text)

-        prompt_uncond = llm_handler.build_formatted_prompt_for_understanding(audio_codes="NO USER INPUT", is_negative_prompt=False)
         log_prob_uncond = _calculate_log_prob(llm_handler, prompt_uncond, target_text)
acestep/core/scoring/scoring_test.py (1)

122-126: Prefix unused variable with underscore.

The explanation variable is unpacked but never used. Per Python convention, prefix it with _ to indicate it's intentionally unused.

♻️ Proposed fix
     def test_no_scores_returns_zero(self):
         """Empty scores dict should return zero reward."""
-        total, explanation = calculate_reward_score({})
+        total, _ = calculate_reward_score({})
         self.assertEqual(total, 0.0)
acestep/core/scoring/__init__.py (1)

13-20: Consider sorting __all__ alphabetically.

Static analysis suggests sorting for consistency. This is optional since the current grouping by source module is also logical.

♻️ Alphabetically sorted
 __all__ = [
+    "MusicLyricScorer",
     "MusicStampsAligner",
+    "SentenceTimestamp",
     "TokenTimestamp",
-    "SentenceTimestamp",
-    "MusicLyricScorer",
     "calculate_pmi_score_per_condition",
     "calculate_reward_score",
 ]

Comment on lines +90 to +107
def median_filter(x: torch.Tensor, filter_width: int) -> torch.Tensor:
"""
Apply median filter to tensor.

Args:
x: Input tensor
filter_width: Width of median filter

Returns:
Filtered tensor
"""
pad_width = filter_width // 2
if x.shape[-1] <= pad_width:
return x
if x.ndim == 2:
x = x[None, :]
x = F.pad(x, (filter_width // 2, filter_width // 2, 0, 0), mode="reflect")
result = x.unfold(-1, filter_width, 1).sort()[0][..., filter_width // 2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against non-positive filter_width values.

unfold will throw when filter_width <= 0; fail fast with a clear error to avoid hidden runtime crashes.

🔧 Proposed fix
 def median_filter(x: torch.Tensor, filter_width: int) -> torch.Tensor:
     """
     Apply median filter to tensor.
@@
     Returns:
         Filtered tensor
     """
+    if filter_width <= 0:
+        raise ValueError("filter_width must be a positive integer")
     pad_width = filter_width // 2
     if x.shape[-1] <= pad_width:
         return x
🤖 Prompt for AI Agents
In `@acestep/core/scoring/_dtw.py` around lines 90 - 107, In median_filter, add an
early validation that raises a clear ValueError when filter_width is not a
positive integer (e.g., if filter_width <= 0), so unfold() cannot be called with
non-positive sizes; place this check at the start of median_filter (before
computing pad_width and before any padding/unfolding). Ensure the error message
mentions median_filter and the invalid filter_width value so callers fail fast
with a clear diagnostic.

Comment on lines +1 to +7
"""
DiT Alignment Module

Provides lyrics-to-audio timestamp alignment using cross-attention matrices
from the DiT model. Produces LRC-format timestamps via bidirectional
consensus denoising and DTW.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Module exceeds the 200‑LOC cap; please split further or add a concrete follow‑up plan.

Consider separating preprocessing/denoising, timestamp derivation, and LRC formatting into smaller modules.

As per coding guidelines: Target module size: optimal <= 150 LOC, hard cap 200 LOC.

🤖 Prompt for AI Agents
In `@acestep/core/scoring/dit_alignment.py` around lines 1 - 7, The module DiT
alignment (dit_alignment.py) is over the 200-LOC cap; split its responsibilities
into three focused modules: extract preprocessing/denoising logic into a new
function/class (e.g., preprocess_and_denoise or Denoiser) that handles
bidirectional consensus denoising and any input normalization, move timestamp
derivation/DTW logic into a separate function (e.g., derive_timestamps or
TimestampDeriver) responsible for computing cross-attention-derived alignments
and DTW consensus, and isolate LRC formatting into a tiny formatter function
(e.g., format_lrc or LRCFormatter) that converts timestamps to LRC strings;
update dit_alignment.py to orchestrate these components (calling
preprocess_and_denoise -> derive_timestamps -> format_lrc) and add a short TODO
or tracking issue comment referencing the new modules if further decomposition
is planned to satisfy the size guideline.

Comment on lines +1 to +7
"""
DiT Lyrics Quality Scorer

Evaluates lyrics-to-audio alignment quality using cross-attention energy
matrices. Computes Coverage, Monotonicity, and Path Confidence metrics
via tensor operations.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Module exceeds the 200‑LOC cap; please split further or add a concrete follow‑up plan.

A split between attention preprocessing, path construction, and scoring metrics would keep each module under the cap.

As per coding guidelines: Target module size: optimal <= 150 LOC, hard cap 200 LOC.

🤖 Prompt for AI Agents
In `@acestep/core/scoring/dit_score.py` around lines 1 - 7, The module
acabest/core/scoring/dit_score.py is over the 200‑LOC cap; split it into three
focused modules: move attention preprocessing logic into a new module exposing a
preprocess_attention(...) function, move dynamic/path construction into a
path_builder module exposing build_alignment_path(...) (or similar), and move
the metric computations into a scoring_metrics module with
compute_coverage(...), compute_monotonicity(...), and
compute_path_confidence(...). Update dit_score.py to become a thin orchestrator
that imports these functions and wires them together (retain original public API
names if any), add unit tests for each new module, and include a short follow‑up
TODO in the repo README describing the split and where to find each piece.
Ensure all imports, docstrings, and type hints are preserved when extracting
logic.

@ChuxiJ ChuxiJ merged commit f95ca3a into main Feb 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant