Skip to content

Conversation

@devonjkohler
Copy link
Contributor

@devonjkohler devonjkohler commented Sep 9, 2025

Motivation and Context

Feature branch for MSstats+ adding weights to differential analysis if anomaly detection model used in the converter. Includes vignettes and tests.

Changes

  • AFT imputation for linear summarization model
  • anomaly model weights for linear summarization model
  • Carry variance to differential analysis for weights in DA

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have run the devtools::document() command after my changes and committed the added files

Summary by CodeRabbit

  • New Features

    • Added global control for AFT fitting iterations; linear summarization now emits per-run variances and accepts covariance-based weighting. Anomaly scores enable variance/weighting and require linear summarization. Group comparison and linear fits use variance/weights when available. Censored-value handling now runs whenever imputation is enabled.
  • Documentation

    • Updated docs/examples for new parameters and outputs; added MSstats+ vignette describing weighted workflows.
  • Tests

    • Extensive new tests covering weighting, variance behavior, anomaly-score validation, and linear summarization.
  • Style

    • Minor logging and formatting adjustments.

@coderabbitai
Copy link

coderabbitai bot commented Sep 9, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds variance-aware linear summarization and anomaly-derived weighting across the pipeline. Threads a new aft_iterations param and covariance matrices through R and C++ layers, updates get_linear_summary to return LogIntensities+Variance, relaxes censored handling, adds anomaly-weight utilities, updates docs/tests, and adds a vignette.

Changes

Cohort / File(s) Summary
C++ linear summary API
R/RcppExports.R, src/RcppExports.cpp, src/linear_summary.cpp
get_linear_summary signature expanded to accept cov_mat; returns a DataFrame with LogIntensities and Variance; computes variance via cᵀ Σ c; Rcpp wrapper and call registration updated to 5 args.
Data processing & summarization
R/dataProcess.R, R/utils_imputation.R, R/utils_censored.R, R/utils_checks.R, R/utils_summarization.R, R/utils_summarization_prepare.R, R/utils_output.R
Adds aft_iterations param throughout summarization paths; threads aft_iterations into AFT survreg maxiter; relaxes TMP-only guard for censored handling; ensures AnomalyScores column presence and validation; uses newABUNDANCE and optional weights in linear fits; changes some prepare/finalize flows to prefer TMP.
Anomaly-weight utilities
R/utils_anomaly_weights.R
New helpers .normalize_anomalies and .calculate_weights that normalize anomaly scores and derive per-row weights (PPCA path, edge-case handling; contains commented placeholder stats).
Model / groupComparison weighting
R/utils_groupcomparison.R, R/groupComparison.R
Compute weight_input = 1/Variance when Variance available and pass weights=weight_input to lm/lmer variants across design branches; minor logging reflow.
Imputation & survival
R/utils_imputation.R
.fitSurvival signature gains aft_iterations and uses it for survreg maxiter; callers must supply the new arg (one internal caller currently unchanged).
Summarization helpers
R/utils_summarization_prepare.R, R/utils_summarization.R
Ensure ANOMALYSCORES exists; route prepare/finalize to TMP-centric helpers; add optional weighting in .fitLinearModel and switch non-labeled dependent var to newABUNDANCE.
Input validation
R/utils_checks.R, man/dot-checkDataProcessParams.Rd
.checkDataProcessParams gains input_columns; error if AnomalyScores present but summaryMethod != "linear"; required columns list updated.
Docs (Rd) and examples
man/*.Rd (multiple files listed)
Document aft_iterations and updated function usage/examples; update MSstatsSummarizationOutput return text and example calls to match new signatures.
Tests
inst/tinytest/test_dataProcess.R, inst/tinytest/test_groupComparison.R, inst/tinytest/test_utils_groupComparison_model.R, inst/tinytest/test_utils_summarization.R
New/extended tests for anomaly-aware linear summarization, variance propagation, weight propagation in models, and groupComparison weighting; small test formatting fix.
Vignette
vignettes/MSstatsPlus.Rmd
New vignette describing MSstats+ workflow: anomaly scoring, weighting, AFT settings, example pipelines and diagnostics.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant R as dataProcess / MSstatsSummarize*
  participant L as MSstatsSummarizeSingleLinear
  participant F as lm/lmer + vcov
  participant C as .Call(_MSstats_get_linear_summary)
  participant CPP as get_linear_summary (C++)

  R->>L: single_protein, impute, censored_symbol, remove50missing, aft_iterations, equal_variances
  L->>F: fit model (use newABUNDANCE / weights)
  F-->>L: coefficients + vcov (cov_mat)
  L->>C: input, coefs, counts, is_labeled, cov_mat
  C->>CPP: pass args
  CPP-->>C: DataFrame{LogIntensities, Variance}
  C-->>L: DataFrame
  L-->>R: run-level data (+ Variance) and survival-related data
Loading
sequenceDiagram
  autonumber
  participant GC as groupComparison
  participant FM as .fitModelForGroupComparison
  participant M as lm/lmer

  GC->>FM: summarized input (may include Variance)
  FM->>FM: if Variance present -> weight_input = 1/Variance else NULL
  FM->>M: fit model with weights=weight_input
  M-->>FM: weighted fit (coeffs, SE, p-values)
  FM-->>GC: results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Feature model weights #174 — implements similar changes: adds cov_mat parameter and variance output to get_linear_summary, threads AFT/weighting changes through summarization pipeline.

Suggested labels

Review effort 2/5

Suggested reviewers

  • mstaniak

Pre-merge checks (2 warnings, 1 inconclusive)

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The current description includes a Motivation section and high-level Change bullets but omits detailed enumeration of the many signature updates, utility functions, documentation edits, vignette additions, and specific tests implemented, and the Testing section remains a placeholder rather than listing the actual unit tests. Please expand the Changes section with detailed bullet points for each API, documentation, and test update, and fill in the Testing section by listing the specific new or modified unit tests and vignette validations.
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “Feature model weights” is brief but too generic to convey the primary change, which is introducing anomaly-derived weighting and variance propagation into the summarization and group-comparison workflows; it does not reference anomaly scores or differential analysis and could be misinterpreted. Please revise the title to clearly reflect the main feature, for example “Add anomaly-derived weights and variance propagation to group comparison” or similar.

Poem

A nibble of code, a hop through the night,
I weighted the runs till variances felt right.
With cov-mats tucked snug and AFT set to run,
I balanced the scores till the summaries shone.
Thump-thump — MSstats+: lighter, clever, and fun. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between beecaae and 5b972e7.

📒 Files selected for processing (1)
  • vignettes/MSstatsPlus.Rmd (1 hunks)
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature-model_weights

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 24

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)
R/utils_censored.R (4)

36-44: Guard against empty quantiles / zero IQR producing NaN cutoff and NA flags.

When no data or IQR == 0, cutoff_lower becomes NaN; downstream comparisons yield NA and can cascade into errors. Compute vals first, handle IQR <= eps, and allow a NA cutoff to skip quantile-based censoring.

-            quantiles = input[!is.na(INTENSITY) & INTENSITY > 1 & LABEL == "L", 
-                              quantile(ABUNDANCE, 
-                                       prob = c(0.01, 0.25, 0.5, 0.75, 
-                                                censored_cutoff), 
-                                       na.rm = TRUE)]
-            iqr = quantiles[4] - quantiles[2]
-            multiplier = (quantiles[5] - quantiles[4]) / iqr
-            cutoff_lower = (quantiles[2] - multiplier * iqr) 
+            vals = input[!is.na(INTENSITY) & INTENSITY > 1 & LABEL == "L", ABUNDANCE]
+            if (length(vals)) {
+                quantiles = quantile(vals,
+                                     prob = c(0.01, 0.25, 0.5, 0.75, censored_cutoff),
+                                     na.rm = TRUE, names = FALSE)
+                iqr = quantiles[4] - quantiles[2]
+                if (!is.finite(iqr) || iqr <= .Machine$double.eps) {
+                    cutoff_lower = quantiles[2]
+                } else {
+                    multiplier = (quantiles[5] - quantiles[4]) / iqr
+                    cutoff_lower = quantiles[2] - multiplier * iqr
+                }
+            } else {
+                cutoff_lower = NA_real_
+            }

44-46: Prevent NA-propagation in the censoring predicate.

Ensure cutoff_lower is finite before vectorized comparison to avoid NA censor flags.

-            input$censored = !is.na(input$INTENSITY) & 
-                input$LABEL == "L" &
-                input$ABUNDANCE < cutoff_lower
+            input$censored = is.finite(cutoff_lower) &
+                !is.na(input$INTENSITY) &
+                input$LABEL == "L" &
+                input$ABUNDANCE < cutoff_lower

47-54: Fix if(NA) runtime risk when cutoff_lower is NA.

if (cutoff_lower <= 0 & ...) can become if (NA) and error. Also prefer && for scalars; outer guard already ensures missing_symbol is non-null.

-            if (cutoff_lower <= 0 & !is.null(missing_symbol) & missing_symbol == "0") {
+            if (is.finite(cutoff_lower) && missing_symbol == "0" && cutoff_lower <= 0) {
                 zero_one_filter = !is.na(input$ABUNDANCE) & input$ABUNDANCE <= 0
                 input$censored = ifelse(zero_one_filter, TRUE, input$censored)
             }
-            if (!is.null(missing_symbol) & missing_symbol == "NA") {
+            if (missing_symbol == "NA") {
                 input$censored = ifelse(is.na(input$INTENSITY), TRUE, 
                                         input$censored)
             }

32-79: Restrict censored‐flagging to TMP or properly support linear summarization.

  • MSstatsHandleMissing() ignores summary_method, so with impute=TRUE non-TMP paths now get input$censored marks.
  • Both .prepareSummary() (always using .prepareTMP()) and MSstatsSummarizeSingleLinear() apply AFT imputation based on those flags.
  • Either guard MSstatsHandleMissing() with if (summary_method=="TMP") (or force impute=FALSE for non-TMP), or re-enable and complete .prepareLinear()/.finalizeLinear() branches and update docs to reflect linear AFT support.
R/utils_summarization.R (1)

210-216: WLS not applied: lm() re-fit uses a NULL symbol instead of the computed column.

wls.fit = lm(..., weights = weight) references a local NULL variable, not input$weight. This disables the intended heteroscedasticity correction.

Apply:

-        wls.fit = lm(formula(fit), data = input, weights = weight)
+        wls.fit = lm(formula(fit), data = input, weights = input$weight)
R/utils_imputation.R (2)

15-15: Avoid set.seed() in package code.

Setting the global RNG seed is a side effect that disrupts users. survreg() is deterministic; remove the seed.

Apply:

-    set.seed(100)
+    # no RNG needed here

60-65: .addSurvivalPredictions fails to pass aft_iterations.

This will error after the signature change or silently use the default if added. Thread the parameter through.

Apply:

-.addSurvivalPredictions = function(input) {
+.addSurvivalPredictions = function(input, aft_iterations = 90) {
     LABEL = NULL
     
-    survival_fit = .fitSurvival(input[LABEL == "L", ])
+    survival_fit = .fitSurvival(input[LABEL == "L", ], aft_iterations)
     predict(survival_fit, newdata = input)
 }
man/MSstatsSummarizeSingleTMP.Rd (1)

18-27: Fix user-facing typos and terminology (AFT)

  • “Accelated failure model.” → “Accelerated Failure Time (AFT) model.”
  • “intensites” → “intensities.”
  • “Null assumes …” → “NULL assumes …” (R constant).

Apply:

-\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. 
-TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. 
+\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. 
+TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) using an Accelerated Failure Time (AFT) model. 
 FALSE uses the values assigned by cutoffCensored}
@@
-Null assumes that all NA intensites are randomly missing.}
+NULL assumes that all NA intensities are randomly missing.}
R/groupComparison.R (1)

91-96: Robust labeled check and remove dead code

data$FeatureLevelData$Label is inconsistent with later use of LABEL and may be NULL for minimal inputs. Either fix case or guard; also labeled isn’t used later here.

-    labeled = data.table::uniqueN(data$FeatureLevelData$Label) > 1
+    labeled = FALSE
+    if (!is.null(data$FeatureLevelData) &&
+        "LABEL" %in% colnames(data$FeatureLevelData)) {
+        labeled = data.table::uniqueN(data$FeatureLevelData$LABEL) > 1
+    }

Optionally drop the variable entirely if unused.

src/linear_summary.cpp (1)

184-186: Avoid NA propagation in contrast construction

NumericVector default-initializes to NA; leaving other entries as NA risks NaN results in get_quant/get_quant_var. Initialize to zeros.

-        NumericVector contrast_matrix(num_runs);
+        NumericVector contrast_matrix(num_runs, 0.0);
         contrast_matrix[run_id] = 1;
-        NumericVector contrast_matrix(num_runs);
+        NumericVector contrast_matrix(num_runs, 0.0);
         contrast_matrix[num_runs - 1] = 1;

Also applies to: 197-199

🧹 Nitpick comments (27)
R/utils_censored.R (2)

61-66: Avoid logging NaN/Inf cutoff values.

Only log cutoff when finite; still log the generic symbol message.

-            getOption("MSstatsMsg")("INFO", msg)
-            getOption("MSstatsMsg")("INFO", msg_2)
-            getOption("MSstatsLog")("INFO", msg)
-            getOption("MSstatsLog")("INFO", msg_2)
+            if (is.finite(cutoff_lower)) {
+                getOption("MSstatsMsg")("INFO", msg)
+                getOption("MSstatsLog")("INFO", msg)
+            }
+            getOption("MSstatsMsg")("INFO", msg_2)
+            getOption("MSstatsLog")("INFO", msg_2)

68-75: Minor: prefer && for scalar checks.

Use short-circuit operators in scalar conditionals for clarity and to skip unnecessary RHS work.

-            if (missing_symbol == '0') {
+            if (missing_symbol == '0') {
                 input$censored = input$LABEL == "L" & 
                     !is.na(input$INTENSITY) &
                     (input$INTENSITY == 1 | input$ABUNDANCE <= 0)
-            } else if (missing_symbol == 'NA') {
+            } else if (missing_symbol == 'NA') {
                 input$censored = input$LABEL == "L" & is.na(input$ABUNDANCE)
             }
R/utils_summarization.R (2)

136-143: Redundant branches in non-missing filter.

Both branches under censored_symbol == "NA" and else are identical. Simplify.

Apply:

-    if (!is.null(censored_symbol)) {
-        if (censored_symbol == "NA") {
-            nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored
-        } else {
-            nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored 
-        }
-    } else {
-        nonmissing_filter = input$LABEL == "L" & !is.na(input$INTENSITY)
-    }
+    if (!is.null(censored_symbol)) {
+        nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored
+    } else {
+        nonmissing_filter = input$LABEL == "L" & !is.na(input$INTENSITY)
+    }

211-213: Typo in comment.

“valuaes” → “values”.

Apply:

-        ## loess fitted valuaes are predicted sd
+        ## loess fitted values are predicted sd
R/utils_anomaly_weights.R (3)

13-16: Optional: validate required columns early.

Fail fast if ANOMALYSCORES, PEPTIDE, censored, or imputation_var are missing.

Proposed prelude:

required_cols <- c("ANOMALYSCORES","PEPTIDE","censored","imputation_var")
missing <- setdiff(required_cols, names(data))
if (length(missing)) stop("Missing columns: ", paste(missing, collapse=", "))

1-11: Consider simpler normalization.

Subtract per-PEPTIDE min without adding global_min; this guarantees non-negativity and preserves within-peptide contrasts.

Apply:

-    global_min = min(data$ANOMALYSCORES, na.rm = TRUE)
-    data[, local_min := min(ANOMALYSCORES, na.rm = TRUE), by = PEPTIDE]
-    data[, normalized_score := ANOMALYSCORES - local_min + global_min]
+    data[, local_min := min(ANOMALYSCORES, na.rm = TRUE), by = PEPTIDE]
+    data[, normalized_score := ANOMALYSCORES - local_min]

1-1: Add data.table import for merge method (if not already).

If not present elsewhere, add roxygen import to ensure data.table methods are available.

Add at file top:

#' @import data.table
R/utils_checks.R (2)

154-157: Use numeric NA for AnomalyScores.

Set a numeric type to avoid downstream type coercion surprises and to align with later NA_real_ usage.

Apply:

-  input$AnomalyScores = NA
+  input$AnomalyScores = NA_real_

198-202: Minor: improve user-facing error message grammar.

Polish for clarity.

-            "So far, only label-free or reference-labeled experiment are supported. - stop"))
+            "So far, only label-free or reference-labeled experiments are supported. - stop"))

Also applies to: 204-207

R/utils_summarization_prepare.R (1)

30-33: Ensure ANOMALYSCORES is numeric.

Match utils_checks.R (NA_real_) to keep a consistent numeric type.

-    if (!"ANOMALYSCORES" %in% colnames(input)) {
-        input[, ANOMALYSCORES := NA]
-    }
+    if (!"ANOMALYSCORES" %in% colnames(input)) {
+        input[, ANOMALYSCORES := NA_real_]
+    }
R/RcppExports.R (1)

12-14: Add cov_mat validation in get_linear_summary
In R/RcppExports.R, before the .Call, assert that cov_mat is a square matrix with ncol(cov_mat) == length(coefs) and is symmetric, otherwise stop with a clear error.

inst/tinytest/test_utils_groupComparison_model.R (1)

16-18: Harden the test against edge cases (zero/NA variances, numeric fuzz).

Add a case with 0/NA Variance to assert the model either guards or errors clearly; also consider a tolerance in expect_equal to avoid float issues.

man/MSstatsSummarizeSingleTMP.Rd (1)

53-55: Prefer named argument in example to guard against signature drift

Use a named parameter for aft_iterations in the example.

-single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]],
-                                                   impute, cens, FALSE, 100)
+single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]],
+                                                   impute, cens, FALSE,
+                                                   aft_iterations = 100)
inst/tinytest/test_utils_summarization.R (3)

1-29: Synthetic fixture looks good; minor cleanup optional

ANOMALYSCORES isn’t used in this unit and could be dropped to keep the fixture minimal. No action required if you prefer the explicitness.


31-35: Equality check is fine; consider a tolerance for portability

Direct equality on numeric vectors can be brittle across platforms. Add a tiny tolerance.

-expect_equal(summarized$weights, single_protein$weights,
-             info = "Weights from model should match input weights")
+expect_equal(summarized$weights, single_protein$weights, tolerance = 1e-12,
+             info = "Weights from model should match input weights")

41-47: Keying into coefficients by name is brittle

If internal names change, this will break. Optionally guard with expect_true("FEATUREASGLLLER_2_y3_1" %in% names(summarized$coefficients)) before comparing, or retrieve by matching the feature column.

+expect_true("FEATUREASGLLLER_2_y3_1" %in% names(summarized$coefficients))
R/groupComparison.R (1)

97-101: Tighten log/message phrasing

“Start to test and get inference in whole plot” is awkward. Shorten and unify LOG/MSG.

-    getOption("MSstatsLog")(
-        "INFO", "== Start to test and get inference in whole plot")
-    getOption("MSstatsMsg")(
-        "INFO", " == Start to test and get inference in whole plot ...")
+    getOption("MSstatsLog")("INFO", "== Starting group comparison on all proteins")
+    getOption("MSstatsMsg")("INFO", " == Starting group comparison on all proteins ...")
inst/tinytest/test_groupComparison.R (2)

49-54: Use a small tolerance for numeric equality

Floating-point quirks can cause spurious failures across platforms.

-expect_equal(no_anomaly_Q9UFW8$SE, with_anomaly_Q9UFW8$SE, 
+expect_equal(no_anomaly_Q9UFW8$SE, with_anomaly_Q9UFW8$SE, tolerance = 1e-12, 
              info = "SE values for Q9UFW8 should be the same")
-expect_equal(no_anomaly_Q9UFW8$pvalue, with_anomaly_Q9UFW8$pvalue,
+expect_equal(no_anomaly_Q9UFW8$pvalue, with_anomaly_Q9UFW8$pvalue, tolerance = 1e-12,
              info = "pvalue values for Q9UFW8 should be the same")

59-62: Inequality assertions are good; optionally add tolerance bands

If borderline, consider asserting a minimum relative/absolute difference to avoid flakes.

inst/tinytest/test_dataProcess.R (1)

29-48: Consider setting a seed before imputation

If AFT or related steps introduce any stochasticity, set a seed for stability.

+# Ensure deterministic behavior if any stochastic steps are used
+set.seed(42)
 msstats_input = data.table::data.table(
src/linear_summary.cpp (1)

193-194: Numerical safety: clamp tiny negative variances to 0

Floating-point noise can yield slightly negative values. Clamp to non-negative.

-        variances[run_id] = get_quant_var(contrast, cov_mat);
+        variances[run_id] = std::max(0.0, get_quant_var(contrast, cov_mat));
-        variances.push_back(get_quant_var(contrast, cov_mat));
+        variances.push_back(std::max(0.0, get_quant_var(contrast, cov_mat)));

Also applies to: 203-204

vignettes/MSstatsPlus.Rmd (4)

116-123: Standardize column names or use named args for clarity

The example uses sanitized column names (FGShapeQualityScore(MS2)) while earlier text uses raw Spectronaut headers with spaces/dots. Clarify that the converter standardizes names, or standardize explicitly to avoid user confusion.

 msstats_input = MSstatsConvert::SpectronauttoMSstatsFormat(
     spectronaut_raw, annotation, intensity = 'PeakArea',
     excludedFromQuantificationFilter = TRUE, # Pre-filtering
     calculateAnomalyScores=TRUE, # Key parameter to use MSstats+
-    anomalyModelFeatures=c("FGShapeQualityScore(MS2)",
-                           "FGShapeQualityScore(MS1)",
-                           "EGDeltaRT"), # Quality metrics
+    # Use standardized column names (no dots/spaces) or call name standardizer
+    anomalyModelFeatures=c("FGShapeQualityScore(MS2)",
+                           "FGShapeQualityScore(MS1)",
+                           "EGDeltaRT"), # Quality metrics (standardized)

And add a short note right below explaining that MSstatsConvert standardizes feature names (dots/spaces removed), or show how to map raw headers to standardized ones.


347-362: Fix axis labels in the fold-change plot

Labels don’t match aesthetics (x=Protein, y=abs(log2FC)).

 ggplot(data=merged_results) + 
     geom_col(aes(x=Protein, y=abs(log2FC), fill=Model), position="dodge") +
     geom_hline(yintercept=1, linetype="dashed", color="black", linewidth=1) +
     theme_bw(base_size =16) + 
     labs(title="Log fold change comparison",
-         x="MSstats+",
-         y="Base MSstats")
+         x="Protein",
+         y="|log2FC|")

367-373: Fix axis labels in the SE plot

Same issue as above.

 ggplot(data=merged_results) + 
     geom_col(aes(x=Protein, y=SE, fill=Model), position="dodge") +
     theme_bw(base_size =16) + 
     labs(title="Standard error comparison",
-         x="MSstats+",
-         y="Base MSstats")
+         x="Protein",
+         y="SE")

260-265: Use named arguments for MSstatsAnomalyScores example

Unlabeled positional args reduce readability and risk misordering.

 input = MSstatsConvert::MSstatsAnomalyScores(
-    msstats_input, 
-    c("FG.ShapeQualityScore (MS1)", "FG.ShapeQualityScore (MS2)", "EG.DeltaRT"), 
-    c("mean_decrease", "mean_decrease", "dispersion_increase"),
-    .5, 100, run_order, 100, "auto", 1)
+    data = msstats_input, 
+    features = c("FG.ShapeQualityScore (MS1)", "FG.ShapeQualityScore (MS2)", "EG.DeltaRT"), 
+    featureTemporal = c("mean_decrease", "mean_decrease", "dispersion_increase"),
+    trainQuantile = 0.5, sampleSize = 100, runOrder = run_order,
+    n_trees = 100, max_depth = "auto", numberOfCores = 1)
R/utils_groupcomparison.R (1)

466-478: Validate has_imputed attribute presence and type
After retrieving has_imputed <- attr(summarized_list, "has_imputed"), add a check to ensure it exists and is a length-one logical, for example:

if (!is.logical(has_imputed) || length(has_imputed) != 1) {
  stop("`summarized_list` must have a single logical `has_imputed` attribute")
}

This prevents unexpected NULL or non-logical values from causing runtime errors.

R/dataProcess.R (1)

128-129: Document the default value choice for aft_iterations

The default value of 90 iterations for AFT model fitting seems arbitrary. Consider documenting why this specific value was chosen or make it configurable based on convergence criteria.

Consider adding a comment explaining the rationale:

-    numberOfCores = 1, aft_iterations=90
+    numberOfCores = 1, aft_iterations=90  # Default based on empirical testing for typical proteomics datasets
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8479d28 and dcdfd78.

📒 Files selected for processing (26)
  • R/RcppExports.R (1 hunks)
  • R/dataProcess.R (15 hunks)
  • R/groupComparison.R (3 hunks)
  • R/utils_anomaly_weights.R (1 hunks)
  • R/utils_censored.R (1 hunks)
  • R/utils_checks.R (5 hunks)
  • R/utils_groupcomparison.R (4 hunks)
  • R/utils_imputation.R (2 hunks)
  • R/utils_output.R (2 hunks)
  • R/utils_summarization.R (1 hunks)
  • R/utils_summarization_prepare.R (2 hunks)
  • inst/tinytest/test_dataProcess.R (1 hunks)
  • inst/tinytest/test_groupComparison.R (1 hunks)
  • inst/tinytest/test_utils_groupComparison_model.R (1 hunks)
  • inst/tinytest/test_utils_summarization.R (1 hunks)
  • man/MSstatsSummarizationOutput.Rd (1 hunks)
  • man/MSstatsSummarize.Rd (1 hunks)
  • man/MSstatsSummarizeSingleLinear.Rd (2 hunks)
  • man/MSstatsSummarizeSingleTMP.Rd (2 hunks)
  • man/MSstatsSummarizeWithMultipleCores.Rd (1 hunks)
  • man/MSstatsSummarizeWithSingleCore.Rd (2 hunks)
  • man/dataProcess.Rd (3 hunks)
  • man/dot-checkDataProcessParams.Rd (2 hunks)
  • src/RcppExports.cpp (2 hunks)
  • src/linear_summary.cpp (4 hunks)
  • vignettes/MSstatsPlus.Rmd (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
R/RcppExports.R (1)
src/comparison_contrasts.cpp (1)
  • get_estimable_fixed_random (6-27)
inst/tinytest/test_groupComparison.R (1)
src/comparison_contrasts.cpp (1)
  • get_estimable_fixed_random (6-27)
src/RcppExports.cpp (1)
src/linear_summary.cpp (2)
  • get_linear_summary (169-210)
  • get_linear_summary (169-173)
src/linear_summary.cpp (1)
src/RcppExports.cpp (10)
  • contrast (21-21)
  • cov_mat (52-52)
  • input (32-32)
  • input (48-48)
  • coefs (33-33)
  • coefs (49-49)
  • counts (35-35)
  • counts (50-50)
  • is_labeled (36-36)
  • is_labeled (51-51)
🔇 Additional comments (31)
R/utils_summarization.R (1)

167-172: LGTM: add weights to non-labeled linear models.

Passing weights into the non-labeled single- and multi-feature lm() calls aligns with the PR’s goal.

R/utils_imputation.R (1)

20-49: LGTM: consistently using aft_iterations for survreg control.

Replacing hard-coded maxiter with aft_iterations across labeled/unlabeled paths is correct.

man/dot-checkDataProcessParams.Rd (1)

7-15: Approve docs update: R/utils_checks.R includes @param input_columns matching the function’s signature; re-run roxygen to regenerate the updated manual page.

R/utils_checks.R (2)

99-105: Good guard: disallow AnomalyScores with non-linear summary.

This check is appropriate and prevents silent misuse.


91-94: All call sites include the new input_columns parameter
Verified that R/dataProcess.R passes colnames(raw) as the seventh argument and the man page (man/dot-checkDataProcessParams.Rd) documents the updated signature.

man/MSstatsSummarizationOutput.Rd (1)

56-56: Example updated to 7-arg MSstatsSummarizeWithSingleCore call—LGTM.

inst/tinytest/test_utils_groupComparison_model.R (1)

1-19: Good: verifies inverse-variance weights are plumbed to full_fit.

R/utils_output.R (1)

35-35: Example updated to pass aft_iterations—LGTM.

man/MSstatsSummarizeWithSingleCore.Rd (1)

62-63: Example update to 7 args—LGTM.

man/MSstatsSummarizeSingleLinear.Rd (1)

41-41: Example aligns with new signature—LGTM.

R/groupComparison.R (3)

101-104: LGTM: purely formatting for readability

Argument wrapping improves readability; no functional change.


175-186: LGTM: early return for parallel path

The guard and formatting are fine.


275-287: LGTM: clearer multi-line call

Pure formatting. No functional concerns.

inst/tinytest/test_groupComparison.R (2)

19-21: Row-count parity assertion is good

Validates multi-core equivalence at the surface level.


22-36: Weighted vs unweighted fixtures look reasonable

Variance pattern should exercise weighting differences for Q96S19 and sameness for Q9UFW8.

inst/tinytest/test_dataProcess.R (8)

27-28: LGTM: basic sanity on tech reps/fractions

Good minimal check that FeatureLevelData is populated.


51-55: Error pattern check is appropriate

Pattern is specific enough; no change needed.


64-74: Good validations on Variance column

Covers presence, positivity, finiteness, and per-protein constancy (for Q9UFW8).


96-104: Comparative variance test is meaningful

Captures expected behavior shift with lower anomaly scores.


106-115: Absence-of-anomaly control is sound

Identity check guards against accidental reuse of the same variance vector.


149-156: Single-protein linear summarization: good coverage

Parameters (including aft_iterations and equal_variances) are exercised; shapes validated.


171-189: Variance stratification assertions are strong

Clear expectation that runs 4–5 have substantially higher variance.


190-210: Proximity check to feature-level abundances is a nice consistency test

Solid behavioral validation of weighted aggregation.

src/linear_summary.cpp (2)

106-112: Variance via cᵀΣc looks correct

Computation is standard and efficient with Armadillo. Nice addition.


206-209: Return schema LGTM

Two-column DataFrame with "LogIntensities" and "Variance" is clear and minimal.

src/RcppExports.cpp (1)

43-56: Wrapper update and registration look correct

Signature, argument forwarding, and CallEntries (arity=5) are consistent.

Please confirm R/RcppExports.R was re-generated and committed so the .Call arity matches (5 args). If helpful, I can generate a quick check script.

Also applies to: 74-74

R/utils_groupcomparison.R (1)

150-155: LGTM! Consistent weight propagation across model fitting paths

The implementation correctly applies variance-based weights to all linear and mixed-effects models across different experimental designs (not repeated, repeated, time-course). This ensures consistent weighting throughout the group comparison workflow.

R/dataProcess.R (4)

545-553: Good convergence handling implementation

The implementation properly handles AFT model convergence warnings by catching them and setting predicted values to NA when convergence fails. This prevents silent failures and ensures data integrity.


486-488: Resolved: get_linear_summary signature matches C++ implementation
Verified that the R call get_linear_summary(single_protein, cf, counts, label, cov_mat) aligns exactly with the C++ definition in src/linear_summary.cpp and its Rcpp export in src/RcppExports.cpp. No further action needed.


427-429: Ignore n_obs column existence concern

The columns n_obs and n_obs_run are generated in MSstatsPrepareForSummarization (via .prepareLinear/.prepareTMP in R/utils_summarization_prepare.R) before downstream filtering, so the runtime filter on those fields in dataProcess.R is valid.

Likely an incorrect or invalid review comment.


555-559: Ignore converged flag scope concern
The local variable converged is defined in MSstatsSummarizeSingleTMP and <<- within the warning handler correctly updates that same variable. No scoping issue remains.

Likely an incorrect or invalid review comment.

Comment on lines +37 to +45
protein_level_data_no_anomalies = protein_level_data_with_anomalies
protein_level_data_no_anomalies$Variance = NULL
QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)
QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)
model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,
use_log_file = FALSE)
model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,
use_log_file = FALSE)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Ensure minimal input structure matches groupComparison expectations

groupComparison() calls MSstatsPrepareForGroupComparison(), which references FeatureLevelData$LABEL. Provide a stub FeatureLevelData to avoid NULL access in minimal tests and use named arguments.

-protein_level_data_no_anomalies$Variance = NULL
-QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)
-QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)
-model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,
-                        use_log_file = FALSE)
-model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,
-                         use_log_file = FALSE)
+protein_level_data_no_anomalies$Variance = NULL
+stub_feature = data.table::data.table(LABEL = factor("L"))
+QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies,
+                          FeatureLevelData = stub_feature)
+QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies,
+                            FeatureLevelData = stub_feature)
+model_no_anomaly = groupComparison(contrast.matrix = "pairwise",
+                                   data = QuantDataNoAnomaly,
+                                   use_log_file = FALSE)
+model_with_anomaly = groupComparison(contrast.matrix = "pairwise",
+                                     data = QuantDataWithAnomaly,
+                                     use_log_file = FALSE)

🏁 Script executed:

#!/bin/bash
# Locate the definition of groupComparison and its signature
rg -nP 'groupComparison\s*<-' -g '*.R'
# Locate MSstatsPrepareForGroupComparison and check for FeatureLevelData$LABEL usage
rg -n 'MSstatsPrepareForGroupComparison' -g '*.R' -C5

Length of output: 6387


🏁 Script executed:

#!/bin/bash
# Show lines around groupComparison usage of FeatureLevelData
sed -n '80,110p' R/groupComparison.R

# Show lines around MSstatsPrepareForGroupComparison and LABEL usage
sed -n '120,150p' R/groupComparison.R

Length of output: 3049


Ensure tests supply a minimal FeatureLevelData stub
Add a stub FeatureLevelData with a LABEL column and include it in your QuantData lists; switch to named args for groupComparison():

-protein_level_data_no_anomalies$Variance = NULL
-QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)
-QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)
-model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,
-                        use_log_file = FALSE)
-model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,
-                         use_log_file = FALSE)
+protein_level_data_no_anomalies$Variance = NULL
+stub_feature = data.table::data.table(LABEL = factor("L"))
+QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies,
+                          FeatureLevelData = stub_feature)
+QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies,
+                            FeatureLevelData = stub_feature)
+model_no_anomaly = groupComparison(contrast.matrix = "pairwise",
+                                   data = QuantDataNoAnomaly,
+                                   use_log_file = FALSE)
+model_with_anomaly = groupComparison(contrast.matrix = "pairwise",
+                                     data = QuantDataWithAnomaly,
+                                     use_log_file = FALSE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
protein_level_data_no_anomalies = protein_level_data_with_anomalies
protein_level_data_no_anomalies$Variance = NULL
QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)
QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)
model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,
use_log_file = FALSE)
model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,
use_log_file = FALSE)
protein_level_data_no_anomalies = protein_level_data_with_anomalies
protein_level_data_no_anomalies$Variance = NULL
# Minimal FeatureLevelData stub required by groupComparison()
stub_feature = data.table::data.table(LABEL = factor("L"))
QuantDataNoAnomaly = list(
ProteinLevelData = protein_level_data_no_anomalies,
FeatureLevelData = stub_feature
)
QuantDataWithAnomaly = list(
ProteinLevelData = protein_level_data_with_anomalies,
FeatureLevelData = stub_feature
)
model_no_anomaly = groupComparison(
contrast.matrix = "pairwise",
data = QuantDataNoAnomaly,
use_log_file = FALSE
)
model_with_anomaly = groupComparison(
contrast.matrix = "pairwise",
data = QuantDataWithAnomaly,
use_log_file = FALSE
)
🤖 Prompt for AI Agents
In inst/tinytest/test_groupComparison.R around lines 37 to 45, the tests call
groupComparison with QuantData lists that lack a minimal FeatureLevelData stub
and use positional args; add a FeatureLevelData element to both
QuantDataNoAnomaly and QuantDataWithAnomaly containing at least a LABEL column
(matching the peptides/features in ProteinLevelData), include it in the lists,
and call groupComparison with named arguments (e.g., data = QuantDataNoAnomaly,
design = "pairwise", use_log_file = FALSE) so the function receives the required
FeatureLevelData and parameters explicitly.

Comment on lines +27 to 29
numberOfCores = 1,
aft_iterations = 90
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

API/docs alignment: aft_iterations added; clarify scope and enforce anomaly→linear rule.

  • Ensure all public wrappers default aft_iterations = 90 consistently.
  • Explicitly state whether aft_iterations affects TMP only (AFT imputation) or also linear paths.
  • The text “if anomaly detection … ‘linear’ must be used” should be enforced in code (e.g., parameter validation) if not already.

To confirm consistency of defaults across the repo:

Also applies to: 71-73, 118-119


🏁 Script executed:

#!/bin/bash
rg -nP '\baft_iterations\s*=\s*90\b' -C2

Length of output: 3445


Clarify aft_iterations scope and enforce anomaly→linear rule

  • Docs (e.g. man/dataProcess.Rd and related *.Rd) must state that aft_iterations only affects AFT/TMP imputation and has no effect on linear summarization.
  • In each MSstatsSummarize… function, add a parameter check to error if impute="anomaly" is used without method="linear".

Comment on lines 57 to 58
output = output = MSstatsSummarizationOutput(input, summarized, processed,
method, impute, cens)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix duplicated assignment in example ("output = output =").

Change to a single assignment to avoid confusing users.

🤖 Prompt for AI Agents
In man/MSstatsSummarizationOutput.Rd around lines 57 to 58, there is a
duplicated assignment in the example ("output = output = ..."); update this to a
single assignment by removing the extra "output =" so the example shows a
single, correct assignment of the function result to the variable.

impute,
censored_symbol,
remove50missing,
aft_iterations,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Docs incomplete: aft_iterations lacks an argument entry and examples use the old signature.

Add an \item for aft_iterations and update examples to include it. Also fix typos in the imputation description.

 \usage{
 MSstatsSummarize(
   proteins_list,
   method,
   impute,
   censored_symbol,
   remove50missing,
-  aft_iterations,
+  aft_iterations,
   equal_variance
 )
 }
 \arguments{
 \item{proteins_list}{list of processed feature-level data}
@@
-\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. 
-TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. 
+\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. 
+TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelerated failure time (AFT) model. 
 FALSE uses the values assigned by cutoffCensored}
@@
-Null assumes that all NA intensites are randomly missing.}
+Null assumes that all NA intensities are randomly missing.}
+
+\item{aft_iterations}{integer; maximum iterations for AFT-based imputation (only for summaryMethod = "TMP"). Typical default is 90.}
@@
-summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, TRUE)
+summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, 90, TRUE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aft_iterations,
\usage{
MSstatsSummarize(
proteins_list,
method,
impute,
censored_symbol,
remove50missing,
aft_iterations,
equal_variance
)
}
\arguments{
\item{proteins_list}{list of processed feature-level data}
\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'.
TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelerated failure time (AFT) model.
FALSE uses the values assigned by cutoffCensored}
\item{aft_iterations}{integer; maximum iterations for AFT-based imputation (only for summaryMethod = "TMP"). Typical default is 90.}
}
\examples{
summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, 90, TRUE)
}
🤖 Prompt for AI Agents
In man/MSstatsSummarize.Rd around line 13, the documentation is missing an \item
entry for the aft_iterations argument and the examples still use the old
function signature; update the Rd file by adding a properly formatted
\item{aft_iterations}{<brief description>} entry describing its purpose and
default behavior, revise the Examples section to call MSstatsSummarize with the
aft_iterations parameter included, and correct typographical errors in the
imputation description (fix spelling and grammar) so the description reads
clearly and accurately.

Comment on lines +7 to +14
MSstatsSummarizeSingleLinear(
single_protein,
impute,
censored_symbol,
remove50missing,
aft_iterations = 90,
equal_variances = TRUE
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Arguments section incomplete for expanded signature.

Document impute, censored_symbol, remove50missing, aft_iterations, and equal_variances to match usage.

🤖 Prompt for AI Agents
In man/MSstatsSummarizeSingleLinear.Rd around lines 7–14, the Arguments section
doesn’t document the expanded signature parameters impute, censored_symbol,
remove50missing, aft_iterations, and equal_variances; add individual Rd entries
for each parameter including its name, expected type, default value (impute —
logical or method, censored_symbol — character scalar, remove50missing —
logical, aft_iterations — integer default 90, equal_variances — logical default
TRUE) and a one-sentence description of what it controls and its effect on the
function (e.g., impute: whether/how missing values are imputed; censored_symbol:
symbol representing censored observations; remove50missing: whether to drop
features with ≥50% missing; aft_iterations: number of AFT EM iterations;
equal_variances: whether to assume equal variances across groups), keeping
wording consistent with other argument entries and updating any \usage or
\examples sections if they reference the expanded signature.

Comment on lines +110 to 115
# if (method == "TMP") {
input = .prepareTMP(input, impute, censored_symbol)
# } else {
# input = .prepareLinear(input, FALSE, censored_symbol)
# }
input
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: method is ignored; linear path never used.

.prepareSummary always routes to TMP, contradicting linear-weighting work and the AnomalyScores guard in utils_checks.R. This will alter behavior for linear summarization.

Restore dispatch:

-# if (method == "TMP") {
-input = .prepareTMP(input, impute, censored_symbol)
-# } else {
-#     input = .prepareLinear(input, FALSE, censored_symbol)
-# }
+if (identical(method, "TMP")) {
+    input = .prepareTMP(input, impute, censored_symbol)
+} else {
+    input = .prepareLinear(input, FALSE, censored_symbol)
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# if (method == "TMP") {
input = .prepareTMP(input, impute, censored_symbol)
# } else {
# input = .prepareLinear(input, FALSE, censored_symbol)
# }
input
if (identical(method, "TMP")) {
input = .prepareTMP(input, impute, censored_symbol)
} else {
input = .prepareLinear(input, FALSE, censored_symbol)
}
input
🤖 Prompt for AI Agents
In R/utils_summarization_prepare.R around lines 110 to 115, the code
unconditionally calls .prepareTMP and ignores the method parameter so the linear
path is never used; restore proper dispatch by checking the method value (e.g.,
if (method == "TMP") { input = .prepareTMP(input, impute, censored_symbol) }
else { input = .prepareLinear(input, FALSE, censored_symbol) }) and ensure the
function returns input; also add a defensive branch or error for
unknown/unsupported method values to match the AnomalyScores guard in
utils_checks.R.

Comment on lines +159 to +164
if (all(!is.na(input$weights))){
weight_input = input$weights
} else {
weight_input = NULL
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard weights: require non-missing, finite, and non-negative before using in lm().

Avoid silent acceptance of NULL/invalid weights (e.g., negative or Inf) which will error or skew fits.

Apply:

-    if (all(!is.na(input$weights))){
-        weight_input = input$weights
-    } else {
-        weight_input = NULL
-    }
+    if (!is.null(input$weights) &&
+        all(is.finite(input$weights)) &&
+        all(input$weights >= 0) &&
+        all(!is.na(input$weights))) {
+        weight_input = input$weights
+    } else {
+        weight_input = NULL
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (all(!is.na(input$weights))){
weight_input = input$weights
} else {
weight_input = NULL
}
if (!is.null(input$weights) &&
all(is.finite(input$weights)) &&
all(input$weights >= 0) &&
all(!is.na(input$weights))) {
weight_input = input$weights
} else {
weight_input = NULL
}
🤖 Prompt for AI Agents
In R/utils_summarization.R around lines 159 to 164, the current guard only
checks for non-NA weights and may pass invalid values (NA, NaN, Inf, negative)
to lm(); update the validation to ensure input$weights is numeric and contains
only finite, non-missing, and non-negative values before assigning weight_input,
otherwise set weight_input <- NULL; specifically reject or coerce non-numeric
inputs, check all(is.finite(weights)) and all(weights >= 0), and only then use
them in lm(), otherwise fall back to NULL to avoid errors or skewed fits.

Comment on lines 168 to 174
// [[Rcpp::export]]
NumericVector get_linear_summary(const DataFrame& input,
DataFrame get_linear_summary(const DataFrame& input,
const NumericVector& coefs,
const NumericMatrix& counts,
const bool is_labeled) {
const bool is_labeled,
const NumericMatrix& cov_mat) {
CharacterVector runs = input["RUN"];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Validate cov_mat shape and alignment with coefs

Add defensive checks to fail fast on mismatched dimensions and non-square inputs.

 DataFrame get_linear_summary(const DataFrame& input,
                                  const NumericVector& coefs,
                                  const NumericMatrix& counts,
                                  const bool is_labeled,
                                  const NumericMatrix& cov_mat) {
+    // Basic validation
+    if (cov_mat.nrow() != cov_mat.ncol()) {
+        stop("cov_mat must be a square matrix (n x n).");
+    }
+    if (cov_mat.nrow() != coefs.size()) {
+        stop("Dimension mismatch: nrow(cov_mat) must equal length(coefs).");
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// [[Rcpp::export]]
NumericVector get_linear_summary(const DataFrame& input,
DataFrame get_linear_summary(const DataFrame& input,
const NumericVector& coefs,
const NumericMatrix& counts,
const bool is_labeled) {
const bool is_labeled,
const NumericMatrix& cov_mat) {
CharacterVector runs = input["RUN"];
// [[Rcpp::export]]
DataFrame get_linear_summary(const DataFrame& input,
const NumericVector& coefs,
const NumericMatrix& counts,
const bool is_labeled,
const NumericMatrix& cov_mat) {
// Basic validation
if (cov_mat.nrow() != cov_mat.ncol()) {
stop("cov_mat must be a square matrix (n x n).");
}
if (cov_mat.nrow() != coefs.size()) {
stop("Dimension mismatch: nrow(cov_mat) must equal length(coefs).");
}
CharacterVector runs = input["RUN"];
// ... rest of function ...
}
🤖 Prompt for AI Agents
In src/linear_summary.cpp around lines 168 to 174, add defensive validation to
ensure cov_mat is square and its dimensions match the number of coefficients:
check cov_mat.nrow() == cov_mat.ncol() and cov_mat.nrow() == coefs.size(); if
either check fails call Rcpp::stop with a clear message (e.g. "cov_mat must be
square and size must equal length(coefs)"); perform these checks early
(immediately after obtaining inputs) so the function fails fast on dimension
mismatches.

Comment on lines +141 to +146
metric.9 Possible values are `mean_decrease`, `mean_increase`, and
`dispersion_increase`. These indicate the direction a metric is changes which
indicates a drop in quality (i.e., decrease, increase, or can go in either
direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a
deviation in expected retention can be either higher or lower.
- `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo and grammar in parameter description

Stray “9” and wording nit in anomalyModelFeatureTemporal.

-   metric.9 Possible values are `mean_decrease`, `mean_increase`, and 
-   `dispersion_increase`. These indicate the direction a metric is changes which
+   metric. Possible values are `mean_decrease`, `mean_increase`, and 
+   `dispersion_increase`. These indicate the direction a metric changes which
    indicates a drop in quality (i.e., decrease, increase, or can go in either
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
metric.9 Possible values are `mean_decrease`, `mean_increase`, and
`dispersion_increase`. These indicate the direction a metric is changes which
indicates a drop in quality (i.e., decrease, increase, or can go in either
direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a
deviation in expected retention can be either higher or lower.
- `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The
metric Possible values are `mean_decrease`, `mean_increase`, and
`dispersion_increase`. These indicate the direction a metric changes which
indicates a drop in quality (i.e., decrease, increase, or can go in either
direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a
deviation in expected retention can be either higher or lower.
- `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The
🤖 Prompt for AI Agents
In vignettes/MSstatsPlus.Rmd around lines 141 to 146, the parameter description
for anomalyModelFeatureTemporal contains a stray "9" after "metric." and awkward
wording; remove the stray "9" and rephrase the sentence for clarity (e.g.,
"Possible values are `mean_decrease`, `mean_increase`, and
`dispersion_increase`. These indicate the direction a metric changes that
corresponds to a drop in quality..."), and fix the surrounding grammar/wording
(e.g., "indicates the direction a metric changes" -> "indicate the direction a
metric changes") so the paragraph reads cleanly and flows correctly.

"FG.ShapeQualityScore (MS1)",
"EG.ApexRT")
anomalyModelFeatures = MSstatsConvert:::.standardizeColnames(anomalyModelFeatures)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid triple-colon usage in vignette

MSstatsConvert:::.standardizeColnames is internal; vignettes should not rely on internals. Show a documented alternative or inline a simple sanitizer.

-anomalyModelFeatures = MSstatsConvert:::.standardizeColnames(anomalyModelFeatures)
+# Example: simple standardization without relying on internals
+anomalyModelFeatures = gsub("\\s+", "", anomalyModelFeatures)            # drop spaces
+anomalyModelFeatures = gsub("\\.", "", anomalyModelFeatures)             # drop dots
+anomalyModelFeatures = gsub("[^A-Za-z0-9()]", "", anomalyModelFeatures)  # keep alnum/()
+# Or, if MSstatsConvert exposes a public helper, prefer that here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
R/dataProcess.R (1)

220-225: Export aft_iterations to workers in PSOCK clusters

aft_iterations is used inside parLapply closures but not exported to the cluster; this may cause “object 'aft_iterations' not found” on some setups.

Apply:

 parallel::clusterExport(cl, c("MSstatsSummarizeSingleTMP", 
                               "MSstatsSummarizeSingleLinear",
                               "input", "impute", "censored_symbol",
-                              "remove50missing", "protein_indices", 
-                              "equal_variance"), 
+                              "remove50missing", "protein_indices", 
+                              "equal_variance", "aft_iterations"), 
                               envir = function_environment)
♻️ Duplicate comments (4)
R/utils_output.R (1)

111-115: Bug: .finalizeInput always routes through TMP; linear summarization path is dead.
Restoring method-based branching is necessary; otherwise linear runs get TMP-specific fields and imputation logic.

Apply:

-    # if (method == "TMP") {
-    input = .finalizeTMP(input, censored_symbol, impute, summarized)
-    # } else {
-    #     input = .finalizeLinear(input, censored_symbol)
-    # }
+    if (identical(method, "TMP")) {
+        input = .finalizeTMP(input, censored_symbol, impute, summarized)
+    } else {
+        input = .finalizeLinear(input, censored_symbol)
+    }
R/dataProcess.R (2)

375-393: Add robust error handling for survival fit and predict in linear path

Same concern as previously raised: failures in .fitSurvival/predict can silently produce NA and derail downstream steps. Add tryCatch and logging; also compute imputation_var only on success.

Apply:

-    if (impute & any(single_protein[["censored"]])) {
-        survival_fit = .fitSurvival(single_protein[LABEL == "L", cols,
-                                                   with = FALSE],
-                                    aft_iterations)
-        sigma2 = survival_fit$scale^2
-        single_protein[, c("predicted", "imputation_var") := {
-            pred = predict(survival_fit, newdata = .SD, se.fit = TRUE)
-            list(pred$fit, pred$se.fit^2 + sigma2)
-        }]
+    if (impute && any(single_protein[["censored"]])) {
+        survival_fit <- tryCatch(
+            .fitSurvival(single_protein[LABEL == "L", cols, with = FALSE],
+                         aft_iterations),
+            error = function(e) {
+                msg <- paste("*** warning: Survival model fitting failed for protein",
+                             unique(single_protein$PROTEIN), ":", conditionMessage(e))
+                getOption("MSstatsLog")("WARN", msg)
+                NULL
+            }
+        )
+        if (!is.null(survival_fit)) {
+            sigma2 <- survival_fit$scale^2
+            single_protein[, c("predicted", "imputation_var") := {
+                pr <- tryCatch(
+                    predict(survival_fit, newdata = .SD, se.fit = TRUE),
+                    error = function(e) {
+                        msg <- paste("*** warning: Survival predict failed for protein",
+                                     unique(single_protein$PROTEIN), ":", conditionMessage(e))
+                        getOption("MSstatsLog")("WARN", msg)
+                        list(fit = rep(NA_real_, .N), se.fit = rep(NA_real_, .N))
+                    }
+                )
+                list(pr$fit, ifelse(is.na(pr$se.fit), NA_real_, pr$se.fit^2 + sigma2))
+            }]
+        } else {
+            single_protein[, c("predicted", "imputation_var") := list(NA_real_, NA_real_)]
+        }

394-399: Validate ANOMALYSCORES before reciprocal; guard missing/invalid values

Prevent division by zero/negatives/Inf and missing column cases; log and fall back to NA weights. Same as earlier suggestion.

Apply:

-    if (all(!is.na(single_protein$ANOMALYSCORES))){
-        single_protein$weights = 1 / single_protein$ANOMALYSCORES
-    } else {
-        single_protein$weights = NA
-    }
+    if ("ANOMALYSCORES" %in% colnames(single_protein) &&
+        all(is.finite(single_protein$ANOMALYSCORES))) {
+        if (any(single_protein$ANOMALYSCORES <= 0)) {
+            msg <- paste("*** warning: Invalid (<=0) anomaly scores for protein",
+                         unique(single_protein$PROTEIN), "- weights not applied")
+            getOption("MSstatsLog")("WARN", msg)
+            single_protein$weights <- NA_real_
+        } else {
+            single_protein$weights <- 1 / single_protein$ANOMALYSCORES
+        }
+    } else {
+        single_protein$weights <- NA_real_
+    }
R/utils_groupcomparison.R (1)

139-147: Validate Variance before computing weights (duplicate of prior ask)

Guard against zero/negative/Inf variances to avoid invalid weights and model errors.

Apply:

-    if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){
-        weight_input = 1/input$Variance
+    if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){
+        if (any(input$Variance <= 0 | is.infinite(input$Variance), na.rm = TRUE)) {
+            msg = paste("*** warning: Invalid variance values detected for protein",
+                        unique(input$Protein), "- weights will not be applied")
+            getOption("MSstatsLog")("WARN", msg)
+            weight_input = NULL
+        } else {
+            weight_input = 1/input$Variance
+        }
     } else {
         weight_input = NULL
     }
🧹 Nitpick comments (11)
R/utils_output.R (1)

33-33: Use a named argument for the new 7th parameter in the example.
Keeps the example resilient to future signature shifts and clearer for users.

Apply:

-#' summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE, 100)
+#' summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE, aft_iterations = 100)

Please confirm the exported name is indeed aft_iterations.

R/dataProcess.R (4)

36-37: Clarify requirement wording for anomaly detection + linear summarization

Suggest making it explicit that anomaly-derived weights are only supported with summaryMethod = "linear", and that TMP will ignore/anonymize anomaly weights.

Apply:

-#' which is robust estimation method. "linear" uses linear mixed model. If 
-#' anomaly detection algorithm is performed, "linear" must be used.
+#' which is a robust estimation method. "linear" uses a (mixed) linear model.
+#' When anomaly detection is used (weights present), summaryMethod must be "linear";
+#' TMP does not apply anomaly-derived weights.

289-290: Examples align with new signatures

Examples compile with aft_iterations in place; consider adding one example that sets equal_variances = FALSE.

Also applies to: 355-356


394-399: Optional: normalize weights to mean 1 to stabilize optimization

If the downstream model uses observation-level weights, scaling weights to mean 1 can improve numerical stability without changing estimates.

Example:

w <- 1 / single_protein$ANOMALYSCORES
single_protein$weights <- w / mean(w, na.rm = TRUE)

333-334: R CMD check NOTE noise: declare globals

Consider adding LABEL, n_obs, n_obs_run, censored, predicted, imputation_var to the NULL globals list to suppress NOTES.

man/MSstatsSummarizeSingleLinear.Rd (2)

27-28: Grammar fix in argument description

“observation are” → “observations are”.

Apply in the roxygen source (R/dataProcess.R):

-#' @param equal_variances if TRUE, observation are assumed to be homoskedastic
+#' @param equal_variances if TRUE, observations are assumed to be homoskedastic

36-51: Docs match new signature; consider documenting weights behavior

Since anomaly-derived weights are now used in linear summarization, add a sentence in Description or Details noting how weights are interpreted (e.g., inverse anomaly scores, expected to be positive/finite).

R/utils_groupcomparison.R (4)

139-147: Optional: stabilize weights and avoid silent row drops

  • Extremely small variances can dominate fits; scale weights to mean=1 and cap extremes.
  • If any NA weights remain, lm/lmer silently drop rows; consider warning with counts and, if many are dropped, disabling weights.

Example:

-        } else {
-            weight_input = 1/input$Variance
+        } else {
+            w <- 1/input$Variance
+            invalid <- !is.finite(w) | w <= 0
+            if (any(invalid)) {
+                getOption("MSstatsLog")("WARN",
+                   paste("*** warning:", sum(invalid), "observations have invalid weights and will be dropped"))
+                w[invalid] <- NA_real_
+            }
+            # stabilize
+            w <- w / mean(w, na.rm = TRUE)
+            # optional cap
+            w <- pmin(w, quantile(w, 0.99, na.rm = TRUE))
+            weight_input <- w
         }

148-152: Use scalar OR (||) to avoid vectorized logic surprises

Use || for single boolean flags.

-        if (!has_tech_replicates | is_single_subject) {
+        if (!has_tech_replicates || is_single_subject) {

139-147: Potential mismatch between modeled data and downstream checks when weights drop rows

If NA/invalid weights cause row exclusion, .getAllComparisons() still uses the original input to assess empty conditions and build contrasts. Consider passing the actual model frame rows (or a filtered input aligned to weights) into .getAllComparisons() to avoid inconsistencies.

Would you like me to draft a minimal change that derives the model frame with model.frame(full_fit) and threads it through to .getAllComparisons()?

Also applies to: 280-296


466-479: Parallel workers: pre-load namespaces for stability and perf

Exporting functions works, but pre-loading packages on workers avoids first-call overhead and surprises with S3/S4 methods.

   cl = parallel::makeCluster(numberOfCores)
+  parallel::clusterEvalQ(cl, {
+    library(lme4)
+    library(data.table)
+    # library(MSstats) # optional if running from installed package
+    NULL
+  })
   parallel::clusterExport(cl, c("MSstatsGroupComparisonSingleProtein", 
                                  "contrast_matrix", "repeated", 
                                  "groups", "samples_info", 
                                  "save_fitted_models", "has_imputed"), 
                            envir = function_environment)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcdfd78 and beecaae.

📒 Files selected for processing (12)
  • R/dataProcess.R (15 hunks)
  • R/groupComparison.R (3 hunks)
  • R/utils_censored.R (1 hunks)
  • R/utils_groupcomparison.R (4 hunks)
  • R/utils_output.R (3 hunks)
  • man/MSstatsSummarizationOutput.Rd (2 hunks)
  • man/MSstatsSummarizeSingleLinear.Rd (2 hunks)
  • man/MSstatsSummarizeSingleTMP.Rd (3 hunks)
  • man/MSstatsSummarizeWithMultipleCores.Rd (2 hunks)
  • man/MSstatsSummarizeWithSingleCore.Rd (3 hunks)
  • man/dataProcess.Rd (3 hunks)
  • src/linear_summary.cpp (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • R/utils_censored.R
  • man/MSstatsSummarizeWithMultipleCores.Rd
  • man/dataProcess.Rd
  • man/MSstatsSummarizationOutput.Rd
  • man/MSstatsSummarizeWithSingleCore.Rd
  • man/MSstatsSummarizeSingleTMP.Rd
  • R/groupComparison.R
  • src/linear_summary.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (10)
R/utils_output.R (1)

13-16: Docs return block is concise and accurate.
Matches the actual return structure of MSstatsSummarizationOutput.

R/dataProcess.R (6)

131-132: API change looks good; params propagated to validation

Adding aft_iterations to the signature and passing colnames(raw) to .checkDataProcessParams is consistent. No action needed.

Also applies to: 142-144


480-500: Nice: convergence handling for TMP imputation

Catches convergence warnings and sets predicted to NA; consistent with robust workflows.


209-212: Threading aft_iterations end-to-end looks correct

Signatures updated and forwarded through both parallel and single-core paths.

Also applies to: 256-261, 294-295


205-206: Note: ensure parallel imports cover parApply variants used

The @importFrom tag lists makeCluster/parLapply/stopCluster/clusterExport; OK as is.


422-427: Good: use vcov(fit) and pass cov_mat to get_linear_summary

This aligns with variance propagation into DA.


366-387: No changes needed for column naming consistency
The code defines the logical censored flag and then creates a numeric cen indicator (0 for censored, 1 otherwise) for use in the Surv model. Both names coexist by design and are correctly applied. (rdrr.io)

R/utils_groupcomparison.R (3)

89-93: LGTM: signature formatting only

No functional change; keeping params stable is good for API compatibility.


168-169: LGTM: weighted lm for single-subject time-course

Appropriate to thread weights through this branch.


505-506: LGTM: single-core signature formatting

No behavior change; keeps API aligned with multi-core path.

Comment on lines +388 to +393
survival = single_protein[, c(cols, "predicted"), with = FALSE]
} else {
survival = single_protein[, cols, with = FALSE]
survival[, predicted := NA]
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Preserve and return imputation variance with survival output

imputation_var is computed but dropped from survival; include it for downstream DA that uses weights/variance.

Apply:

-        survival = single_protein[, c(cols, "predicted"), with = FALSE]
+        survival = single_protein[, c(cols, "predicted", "imputation_var"), with = FALSE]
@@
-        survival = single_protein[, cols, with = FALSE]
-        survival[, predicted := NA]
+        survival = single_protein[, cols, with = FALSE]
+        survival[, `:=`(predicted = NA_real_, imputation_var = NA_real_)]

Also applies to: 429-432

🤖 Prompt for AI Agents
In R/dataProcess.R around lines 388-393, the imputation_var column is computed
but not included in the survival data.frame; update both branches so survival
includes imputation_var (e.g., select cols plus "predicted" and "imputation_var"
in the first branch, and cols plus "imputation_var" in the else branch while
still adding predicted := NA), and make the identical change at lines ~429-432
to ensure imputation variance is preserved and returned for downstream DA
weighting/variance use.

Comment on lines +410 to +416
# fit = try(, silent = TRUE)
fit = .fitLinearModel(single_protein, is_single_feature,
is_labeled = label, equal_variances)

if (inherits(fit, "try-error")) {
msg = paste("*** error : can't fit the model for ", unique(single_protein$PROTEIN))
msg = paste("*** error : can't fit the model for ",
unique(single_protein$PROTEIN))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Restore try/tryCatch around .fitLinearModel; current error check is ineffective

fit is not wrapped in try, yet you test inherits(fit, "try-error"). A model failure will raise and skip the handler. Wrap the call.

Apply:

-    # fit = try(, silent = TRUE)
-    fit = .fitLinearModel(single_protein, is_single_feature, 
-                          is_labeled = label, equal_variances)
+    fit <- try(
+        .fitLinearModel(single_protein, is_single_feature, 
+                        is_labeled = label, equal_variances),
+        silent = TRUE
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# fit = try(, silent = TRUE)
fit = .fitLinearModel(single_protein, is_single_feature,
is_labeled = label, equal_variances)
if (inherits(fit, "try-error")) {
msg = paste("*** error : can't fit the model for ", unique(single_protein$PROTEIN))
msg = paste("*** error : can't fit the model for ",
unique(single_protein$PROTEIN))
fit <- try(
.fitLinearModel(single_protein, is_single_feature,
is_labeled = label, equal_variances),
silent = TRUE
)
if (inherits(fit, "try-error")) {
msg = paste("*** error : can't fit the model for ",
unique(single_protein$PROTEIN))
...
🤖 Prompt for AI Agents
In R/dataProcess.R around lines 410-416, the call to .fitLinearModel is not
wrapped in try/tryCatch yet the code tests inherits(fit, "try-error"), so model
failures will raise and skip the handler; wrap the .fitLinearModel call in
try(..., silent = TRUE) or use tryCatch to catch errors and return an object
that can be detected (e.g., assign the error to fit with class "try-error" or a
sentinel), then keep the existing inherits(fit, "try-error") branch to handle
failures and log the message as before.

Comment on lines +139 to +147
.fitModelForGroupComparison = function(input, repeated,
is_single_subject, has_tech_replicates) {

if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){
weight_input = 1/input$Variance
} else {
weight_input = NULL
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Sanity-check: tests assume full_fit$weights exists

For lm this holds; for lmer use weights(full_fit). Ensure tests only assert on lm path or adapt accordingly.

Run:


🏁 Script executed:

#!/bin/bash
# Locate tests that access full_fit$weights and confirm they don't assume lmer objects
rg -nP -C2 'full_fit\\$weights|weights\\(full_fit\\)' inst tests

Length of output: 0


🏁 Script executed:

#!/bin/bash
# List directory structure to locate test folders
fd -t d .

# Search for direct references to full_fit$weights in test scripts
rg -n 'full_fit\$weights' -C2 --glob '*test*.R'

# Search for calls to weights(full_fit) in test scripts
rg -n 'weights\s*\(\s*full_fit\s*\)' -C2 --glob '*test*.R'

Length of output: 871


Adapt tests to handle weights in lmer models
Inst/tinytest/test_utils_groupComparison_model.R (lines 14, 17) reference result$full_fit$weights, which exists for lm fits but not for lmer. Either assign full_fit$weights <- weights(full_fit) in the mixed-effects branch or update the tests to call weights(full_fit) only on the lmer path.

🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 139-147, the code sets weight_input
but downstream tests expect result$full_fit$weights to exist for lm fits while
lmer fits don't expose that field; in the mixed-effects branch assign the model
weights into the fitted object (e.g. set full_fit$weights <- weights(full_fit)
after fitting the lmer model) so result$full_fit$weights is present, or
alternatively update the tests to call weights(full_fit) only for lmer
objects—implement the former by adding that assignment immediately after the
lmer fit.

Comment on lines 153 to 163
full_fit = suppressMessages(try(
lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), data = input),
lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT),
data = input, weights=weight_input),
TRUE
))
df_full = suppressMessages(try(
lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual,
lm(ABUNDANCE ~ GROUP + SUBJECT,
data = input, weights=weight_input)$df.residual,
TRUE
))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden df_full computation when lmer path is used

If the lm fallback fails, df_full becomes a try-error and later code treats it as numeric. Provide a safe fallback.

-            df_full = suppressMessages(try(
-                lm(ABUNDANCE ~ GROUP + SUBJECT, 
-                   data = input, weights=weight_input)$df.residual,
-                TRUE
-            ))
+            df_full = suppressMessages(try(
+                lm(ABUNDANCE ~ GROUP + SUBJECT, 
+                   data = input, weights=weight_input)$df.residual,
+                TRUE
+            ))
+            if (inherits(df_full, "try-error")) {
+                df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
+                                    error = function(e) NA_real_)
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
full_fit = suppressMessages(try(
lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), data = input),
lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT),
data = input, weights=weight_input),
TRUE
))
df_full = suppressMessages(try(
lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual,
lm(ABUNDANCE ~ GROUP + SUBJECT,
data = input, weights=weight_input)$df.residual,
TRUE
))
}
full_fit = suppressMessages(try(
lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT),
data = input, weights=weight_input),
TRUE
))
df_full = suppressMessages(try(
lm(ABUNDANCE ~ GROUP + SUBJECT,
data = input, weights=weight_input)$df.residual,
TRUE
))
if (inherits(df_full, "try-error")) {
df_full <- tryCatch(
stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
error = function(e) NA_real_
)
}
🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 153 to 163, the df_full assignment
from the lm fallback can produce a "try-error" when the lm call fails but later
code assumes df_full is numeric; update the logic to detect a failed try (e.g.
inherits(..., "try-error") or is.null) and provide a safe numeric fallback (such
as NA_real_ or a computed residual df from model dimensions) before it is used
downstream; ensure the check covers both full_fit and df_full failure cases and
that df_full is explicitly coerced to numeric when successful.

Comment on lines +174 to 181
lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT),
data = input, weights=weight_input),
TRUE
))
df_full = suppressMessages(try(
lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual,
lm(ABUNDANCE ~ GROUP + SUBJECT,
data = input, weights=weight_input)$df.residual,
TRUE))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Repeat df_full guard in this branch

Mirror the safe fallback after the lm() try, as above.

-                df_full = suppressMessages(try(
-                    lm(ABUNDANCE ~ GROUP + SUBJECT, 
-                       data = input, weights=weight_input)$df.residual,
-                    TRUE))
+                df_full = suppressMessages(try(
+                    lm(ABUNDANCE ~ GROUP + SUBJECT, 
+                       data = input, weights=weight_input)$df.residual,
+                    TRUE))
+                if (inherits(df_full, "try-error")) {
+                    df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
+                                        error = function(e) NA_real_)
+                }
🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 174 to 181, the df_full assignment
uses try(... )$df.residual directly which will fail when try returns an error;
mirror the safe fallback used earlier by capturing the try result, check for a
"try-error" (or use inherits(..., "try-error")) and if an error set df_full to
the same safe fallback (e.g., NA or 0) otherwise extract $df.residual; update
this branch to follow the same guard pattern as the lm() try above so df_full is
always defined safely.

Comment on lines +184 to 192
lme4::lmer(
ABUNDANCE ~ GROUP + (1|SUBJECT) + (1|GROUP:SUBJECT),
data = input, weights=weight_input),
TRUE
))
df_full = suppressMessages(try(
lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
data = input)$df.residual,
data = input, weights=weight_input)$df.residual,
TRUE
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

And here as well: protect df_full and keep behavior consistent

Add the same fallback computation for df_full in the tech-replicate branch.

-                df_full = suppressMessages(try(
-                    lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
-                       data = input, weights=weight_input)$df.residual,
-                    TRUE
-                ))
+                df_full = suppressMessages(try(
+                    lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
+                       data = input, weights=weight_input)$df.residual,
+                    TRUE
+                ))
+                if (inherits(df_full, "try-error")) {
+                    df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
+                                        error = function(e) NA_real_)
+                }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
lme4::lmer(
ABUNDANCE ~ GROUP + (1|SUBJECT) + (1|GROUP:SUBJECT),
data = input, weights=weight_input),
TRUE
))
df_full = suppressMessages(try(
lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
data = input)$df.residual,
data = input, weights=weight_input)$df.residual,
TRUE
# existing try to compute residual df from lm
df_full = suppressMessages(try(
lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
data = input, weights=weight_input)$df.residual,
TRUE
))
# fallback if lm() fails
if (inherits(df_full, "try-error")) {
df_full <- tryCatch(
stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
error = function(e) NA_real_
)
}
🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 184 to 192, df_full in the
tech-replicate branch is not protected and lacks the same fallback used
elsewhere; wrap the lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, data =
input, weights=weight_input)$df.residual call in suppressMessages(try(...,
TRUE)) and if that returns an error, compute df_full using the identical
fallback logic used in the other branch (e.g., the same nrow/input-based
fallback calculation) so behavior is consistent and failures are handled
gracefully.

}
single_protein = input[protein_indices[[i]],]
MSstatsSummarizeSingleLinear(single_protein, equal_variance)
MSstatsSummarizeSingleLinear(
Copy link
Contributor

@tonywu1999 tonywu1999 Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equal_variance is not passed into MSstatsSummarizeSingleLinear

@Vitek-Lab Vitek-Lab deleted a comment from coderabbitai bot Sep 10, 2025
single_protein = input[protein_indices[[protein_id]],]
summarized_result = MSstatsSummarizeSingleLinear(single_protein,
equal_variance)
summarized_result = MSstatsSummarizeSingleLinear(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equal_variance is not passed into MSstatsSummarizeSingleLinear

@devonjkohler devonjkohler merged commit fea8488 into devel Sep 10, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants