Feature model weights #174

devonjkohler · 2025-09-09T13:57:18Z

Motivation and Context

Feature branch for MSstats+ adding weights to differential analysis if anomaly detection model used in the converter. Includes vignettes and tests.

Changes

AFT imputation for linear summarization model
anomaly model weights for linear summarization model
Carry variance to differential analysis for weights in DA

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
I have run the devtools::document() command after my changes and committed the added files

Summary by CodeRabbit

New Features
- Added global control for AFT fitting iterations; linear summarization now emits per-run variances and accepts covariance-based weighting. Anomaly scores enable variance/weighting and require linear summarization. Group comparison and linear fits use variance/weights when available. Censored-value handling now runs whenever imputation is enabled.
Documentation
- Updated docs/examples for new parameters and outputs; added MSstats+ vignette describing weighted workflows.
Tests
- Extensive new tests covering weighting, variance behavior, anomaly-score validation, and linear summarization.
Style
- Minor logging and formatting adjustments.

…MSstats into feature-model_weights

…son for anomaly score feature

test(anomaly_scores): Add unit tests for dataProcess and groupComparison for anomaly score feature

coderabbitai · 2025-09-09T13:57:26Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds variance-aware linear summarization and anomaly-derived weighting across the pipeline. Threads a new aft_iterations param and covariance matrices through R and C++ layers, updates get_linear_summary to return LogIntensities+Variance, relaxes censored handling, adds anomaly-weight utilities, updates docs/tests, and adds a vignette.

Changes

Cohort / File(s)	Summary
C++ linear summary API `R/RcppExports.R`, `src/RcppExports.cpp`, `src/linear_summary.cpp`	get_linear_summary signature expanded to accept `cov_mat`; returns a DataFrame with `LogIntensities` and `Variance`; computes variance via cᵀ Σ c; Rcpp wrapper and call registration updated to 5 args.
Data processing & summarization `R/dataProcess.R`, `R/utils_imputation.R`, `R/utils_censored.R`, `R/utils_checks.R`, `R/utils_summarization.R`, `R/utils_summarization_prepare.R`, `R/utils_output.R`	Adds `aft_iterations` param throughout summarization paths; threads `aft_iterations` into AFT survreg maxiter; relaxes TMP-only guard for censored handling; ensures `AnomalyScores` column presence and validation; uses `newABUNDANCE` and optional `weights` in linear fits; changes some prepare/finalize flows to prefer TMP.
Anomaly-weight utilities `R/utils_anomaly_weights.R`	New helpers `.normalize_anomalies` and `.calculate_weights` that normalize anomaly scores and derive per-row weights (PPCA path, edge-case handling; contains commented placeholder stats).
Model / groupComparison weighting `R/utils_groupcomparison.R`, `R/groupComparison.R`	Compute `weight_input = 1/Variance` when `Variance` available and pass `weights=weight_input` to lm/lmer variants across design branches; minor logging reflow.
Imputation & survival `R/utils_imputation.R`	`.fitSurvival` signature gains `aft_iterations` and uses it for survreg maxiter; callers must supply the new arg (one internal caller currently unchanged).
Summarization helpers `R/utils_summarization_prepare.R`, `R/utils_summarization.R`	Ensure `ANOMALYSCORES` exists; route prepare/finalize to TMP-centric helpers; add optional weighting in `.fitLinearModel` and switch non-labeled dependent var to `newABUNDANCE`.
Input validation `R/utils_checks.R`, `man/dot-checkDataProcessParams.Rd`	`.checkDataProcessParams` gains `input_columns`; error if `AnomalyScores` present but summaryMethod != "linear"; required columns list updated.
Docs (Rd) and examples `man/*.Rd` (multiple files listed)	Document `aft_iterations` and updated function usage/examples; update MSstatsSummarizationOutput return text and example calls to match new signatures.
Tests `inst/tinytest/test_dataProcess.R`, `inst/tinytest/test_groupComparison.R`, `inst/tinytest/test_utils_groupComparison_model.R`, `inst/tinytest/test_utils_summarization.R`	New/extended tests for anomaly-aware linear summarization, variance propagation, weight propagation in models, and groupComparison weighting; small test formatting fix.
Vignette `vignettes/MSstatsPlus.Rmd`	New vignette describing MSstats+ workflow: anomaly scoring, weighting, AFT settings, example pipelines and diagnostics.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant R as dataProcess / MSstatsSummarize*
  participant L as MSstatsSummarizeSingleLinear
  participant F as lm/lmer + vcov
  participant C as .Call(_MSstats_get_linear_summary)
  participant CPP as get_linear_summary (C++)

  R->>L: single_protein, impute, censored_symbol, remove50missing, aft_iterations, equal_variances
  L->>F: fit model (use newABUNDANCE / weights)
  F-->>L: coefficients + vcov (cov_mat)
  L->>C: input, coefs, counts, is_labeled, cov_mat
  C->>CPP: pass args
  CPP-->>C: DataFrame{LogIntensities, Variance}
  C-->>L: DataFrame
  L-->>R: run-level data (+ Variance) and survival-related data

sequenceDiagram
  autonumber
  participant GC as groupComparison
  participant FM as .fitModelForGroupComparison
  participant M as lm/lmer

  GC->>FM: summarized input (may include Variance)
  FM->>FM: if Variance present -> weight_input = 1/Variance else NULL
  FM->>M: fit model with weights=weight_input
  M-->>FM: weighted fit (coeffs, SE, p-values)
  FM-->>GC: results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Feature model weights #174 — implements similar changes: adds cov_mat parameter and variance output to get_linear_summary, threads AFT/weighting changes through summarization pipeline.

Suggested labels

Review effort 2/5

Suggested reviewers

mstaniak

Pre-merge checks (2 warnings, 1 inconclusive)

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The current description includes a Motivation section and high-level Change bullets but omits detailed enumeration of the many signature updates, utility functions, documentation edits, vignette additions, and specific tests implemented, and the Testing section remains a placeholder rather than listing the actual unit tests.	Please expand the Changes section with detailed bullet points for each API, documentation, and test update, and fill in the Testing section by listing the specific new or modified unit tests and vignette validations.
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title “Feature model weights” is brief but too generic to convey the primary change, which is introducing anomaly-derived weighting and variance propagation into the summarization and group-comparison workflows; it does not reference anomaly scores or differential analysis and could be misinterpreted.	Please revise the title to clearly reflect the main feature, for example “Add anomaly-derived weights and variance propagation to group comparison” or similar.

Poem

A nibble of code, a hop through the night,
I weighted the runs till variances felt right.
With cov-mats tucked snug and AFT set to run,
I balanced the scores till the summaries shone.
Thump-thump — MSstats+: lighter, clever, and fun. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between beecaae and 5b972e7.

📒 Files selected for processing (1)

vignettes/MSstatsPlus.Rmd (1 hunks)

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature-model_weights

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 24

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (10)

R/utils_censored.R (4)
36-44: Guard against empty quantiles / zero IQR producing NaN cutoff and NA flags.

When no data or IQR == 0, cutoff_lower becomes NaN; downstream comparisons yield NA and can cascade into errors. Compute vals first, handle IQR <= eps, and allow a NA cutoff to skip quantile-based censoring.
-            quantiles = input[!is.na(INTENSITY) & INTENSITY > 1 & LABEL == "L", 
-                              quantile(ABUNDANCE, 
-                                       prob = c(0.01, 0.25, 0.5, 0.75, 
-                                                censored_cutoff), 
-                                       na.rm = TRUE)]
-            iqr = quantiles[4] - quantiles[2]
-            multiplier = (quantiles[5] - quantiles[4]) / iqr
-            cutoff_lower = (quantiles[2] - multiplier * iqr) 
+            vals = input[!is.na(INTENSITY) & INTENSITY > 1 & LABEL == "L", ABUNDANCE]
+            if (length(vals)) {
+                quantiles = quantile(vals,
+                                     prob = c(0.01, 0.25, 0.5, 0.75, censored_cutoff),
+                                     na.rm = TRUE, names = FALSE)
+                iqr = quantiles[4] - quantiles[2]
+                if (!is.finite(iqr) || iqr <= .Machine$double.eps) {
+                    cutoff_lower = quantiles[2]
+                } else {
+                    multiplier = (quantiles[5] - quantiles[4]) / iqr
+                    cutoff_lower = quantiles[2] - multiplier * iqr
+                }
+            } else {
+                cutoff_lower = NA_real_
+            }
44-46: Prevent NA-propagation in the censoring predicate.

Ensure cutoff_lower is finite before vectorized comparison to avoid NA censor flags.
-            input$censored = !is.na(input$INTENSITY) & 
-                input$LABEL == "L" &
-                input$ABUNDANCE < cutoff_lower
+            input$censored = is.finite(cutoff_lower) &
+                !is.na(input$INTENSITY) &
+                input$LABEL == "L" &
+                input$ABUNDANCE < cutoff_lower
47-54: Fix if(NA) runtime risk when cutoff_lower is NA.

if (cutoff_lower <= 0 & ...) can become if (NA) and error. Also prefer && for scalars; outer guard already ensures missing_symbol is non-null.
-            if (cutoff_lower <= 0 & !is.null(missing_symbol) & missing_symbol == "0") {
+            if (is.finite(cutoff_lower) && missing_symbol == "0" && cutoff_lower <= 0) {
                 zero_one_filter = !is.na(input$ABUNDANCE) & input$ABUNDANCE <= 0
                 input$censored = ifelse(zero_one_filter, TRUE, input$censored)
             }
-            if (!is.null(missing_symbol) & missing_symbol == "NA") {
+            if (missing_symbol == "NA") {
                 input$censored = ifelse(is.na(input$INTENSITY), TRUE, 
                                         input$censored)
             }
32-79: Restrict censored‐flagging to TMP or properly support linear summarization.

MSstatsHandleMissing() ignores summary_method, so with impute=TRUE non-TMP paths now get input$censored marks.

Both .prepareSummary() (always using .prepareTMP()) and MSstatsSummarizeSingleLinear() apply AFT imputation based on those flags.

Either guard MSstatsHandleMissing() with if (summary_method=="TMP") (or force impute=FALSE for non-TMP), or re-enable and complete .prepareLinear()/.finalizeLinear() branches and update docs to reflect linear AFT support.
R/utils_summarization.R (1)
210-216: WLS not applied: lm() re-fit uses a NULL symbol instead of the computed column.

wls.fit = lm(..., weights = weight) references a local NULL variable, not input$weight. This disables the intended heteroscedasticity correction.

Apply:
-        wls.fit = lm(formula(fit), data = input, weights = weight)
+        wls.fit = lm(formula(fit), data = input, weights = input$weight)
R/utils_imputation.R (2)
15-15: Avoid set.seed() in package code.

Setting the global RNG seed is a side effect that disrupts users. survreg() is deterministic; remove the seed.

Apply:
-    set.seed(100)
+    # no RNG needed here
60-65: .addSurvivalPredictions fails to pass aft_iterations.

This will error after the signature change or silently use the default if added. Thread the parameter through.

Apply:
-.addSurvivalPredictions = function(input) {
+.addSurvivalPredictions = function(input, aft_iterations = 90) {
     LABEL = NULL
     
-    survival_fit = .fitSurvival(input[LABEL == "L", ])
+    survival_fit = .fitSurvival(input[LABEL == "L", ], aft_iterations)
     predict(survival_fit, newdata = input)
 }
man/MSstatsSummarizeSingleTMP.Rd (1)
18-27: Fix user-facing typos and terminology (AFT)

“Accelated failure model.” → “Accelerated Failure Time (AFT) model.”

“intensites” → “intensities.”

“Null assumes …” → “NULL assumes …” (R constant).

Apply:
-\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. 
-TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. 
+\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. 
+TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) using an Accelerated Failure Time (AFT) model. 
 FALSE uses the values assigned by cutoffCensored}
@@
-Null assumes that all NA intensites are randomly missing.}
+NULL assumes that all NA intensities are randomly missing.}
R/groupComparison.R (1)
91-96: Robust labeled check and remove dead code

data$FeatureLevelData$Label is inconsistent with later use of LABEL and may be NULL for minimal inputs. Either fix case or guard; also labeled isn’t used later here.
-    labeled = data.table::uniqueN(data$FeatureLevelData$Label) > 1
+    labeled = FALSE
+    if (!is.null(data$FeatureLevelData) &&
+        "LABEL" %in% colnames(data$FeatureLevelData)) {
+        labeled = data.table::uniqueN(data$FeatureLevelData$LABEL) > 1
+    }
Optionally drop the variable entirely if unused.
src/linear_summary.cpp (1)
184-186: Avoid NA propagation in contrast construction

NumericVector default-initializes to NA; leaving other entries as NA risks NaN results in get_quant/get_quant_var. Initialize to zeros.
-        NumericVector contrast_matrix(num_runs);
+        NumericVector contrast_matrix(num_runs, 0.0);
         contrast_matrix[run_id] = 1;
-        NumericVector contrast_matrix(num_runs);
+        NumericVector contrast_matrix(num_runs, 0.0);
         contrast_matrix[num_runs - 1] = 1;
Also applies to: 197-199

🧹 Nitpick comments (27)

R/utils_censored.R (2)
61-66: Avoid logging NaN/Inf cutoff values.

Only log cutoff when finite; still log the generic symbol message.
-            getOption("MSstatsMsg")("INFO", msg)
-            getOption("MSstatsMsg")("INFO", msg_2)
-            getOption("MSstatsLog")("INFO", msg)
-            getOption("MSstatsLog")("INFO", msg_2)
+            if (is.finite(cutoff_lower)) {
+                getOption("MSstatsMsg")("INFO", msg)
+                getOption("MSstatsLog")("INFO", msg)
+            }
+            getOption("MSstatsMsg")("INFO", msg_2)
+            getOption("MSstatsLog")("INFO", msg_2)
68-75: Minor: prefer && for scalar checks.

Use short-circuit operators in scalar conditionals for clarity and to skip unnecessary RHS work.
-            if (missing_symbol == '0') {
+            if (missing_symbol == '0') {
                 input$censored = input$LABEL == "L" & 
                     !is.na(input$INTENSITY) &
                     (input$INTENSITY == 1 | input$ABUNDANCE <= 0)
-            } else if (missing_symbol == 'NA') {
+            } else if (missing_symbol == 'NA') {
                 input$censored = input$LABEL == "L" & is.na(input$ABUNDANCE)
             }
R/utils_summarization.R (2)
136-143: Redundant branches in non-missing filter.

Both branches under censored_symbol == "NA" and else are identical. Simplify.

Apply:
-    if (!is.null(censored_symbol)) {
-        if (censored_symbol == "NA") {
-            nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored
-        } else {
-            nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored 
-        }
-    } else {
-        nonmissing_filter = input$LABEL == "L" & !is.na(input$INTENSITY)
-    }
+    if (!is.null(censored_symbol)) {
+        nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored
+    } else {
+        nonmissing_filter = input$LABEL == "L" & !is.na(input$INTENSITY)
+    }
211-213: Typo in comment.

“valuaes” → “values”.

Apply:
-        ## loess fitted valuaes are predicted sd
+        ## loess fitted values are predicted sd
R/utils_anomaly_weights.R (3)
13-16: Optional: validate required columns early.

Fail fast if ANOMALYSCORES, PEPTIDE, censored, or imputation_var are missing.

Proposed prelude:
required_cols <- c("ANOMALYSCORES","PEPTIDE","censored","imputation_var")
missing <- setdiff(required_cols, names(data))
if (length(missing)) stop("Missing columns: ", paste(missing, collapse=", "))
1-11: Consider simpler normalization.

Subtract per-PEPTIDE min without adding global_min; this guarantees non-negativity and preserves within-peptide contrasts.

Apply:
-    global_min = min(data$ANOMALYSCORES, na.rm = TRUE)
-    data[, local_min := min(ANOMALYSCORES, na.rm = TRUE), by = PEPTIDE]
-    data[, normalized_score := ANOMALYSCORES - local_min + global_min]
+    data[, local_min := min(ANOMALYSCORES, na.rm = TRUE), by = PEPTIDE]
+    data[, normalized_score := ANOMALYSCORES - local_min]
1-1: Add data.table import for merge method (if not already).

If not present elsewhere, add roxygen import to ensure data.table methods are available.

Add at file top:
#' @import data.table
R/utils_checks.R (2)
154-157: Use numeric NA for AnomalyScores.

Set a numeric type to avoid downstream type coercion surprises and to align with later NA_real_ usage.

Apply:
-  input$AnomalyScores = NA
+  input$AnomalyScores = NA_real_
198-202: Minor: improve user-facing error message grammar.

Polish for clarity.
-            "So far, only label-free or reference-labeled experiment are supported. - stop"))
+            "So far, only label-free or reference-labeled experiments are supported. - stop"))
Also applies to: 204-207
R/utils_summarization_prepare.R (1)
30-33: Ensure ANOMALYSCORES is numeric.

Match utils_checks.R (NA_real_) to keep a consistent numeric type.
-    if (!"ANOMALYSCORES" %in% colnames(input)) {
-        input[, ANOMALYSCORES := NA]
-    }
+    if (!"ANOMALYSCORES" %in% colnames(input)) {
+        input[, ANOMALYSCORES := NA_real_]
+    }
R/RcppExports.R (1)

12-14: Add cov_mat validation in get_linear_summary
In R/RcppExports.R, before the .Call, assert that cov_mat is a square matrix with ncol(cov_mat) == length(coefs) and is symmetric, otherwise stop with a clear error.

inst/tinytest/test_utils_groupComparison_model.R (1)

16-18: Harden the test against edge cases (zero/NA variances, numeric fuzz).

Add a case with 0/NA Variance to assert the model either guards or errors clearly; also consider a tolerance in expect_equal to avoid float issues.
man/MSstatsSummarizeSingleTMP.Rd (1)
53-55: Prefer named argument in example to guard against signature drift

Use a named parameter for aft_iterations in the example.
-single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]],
-                                                   impute, cens, FALSE, 100)
+single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]],
+                                                   impute, cens, FALSE,
+                                                   aft_iterations = 100)
inst/tinytest/test_utils_summarization.R (3)
1-29: Synthetic fixture looks good; minor cleanup optional

ANOMALYSCORES isn’t used in this unit and could be dropped to keep the fixture minimal. No action required if you prefer the explicitness.

31-35: Equality check is fine; consider a tolerance for portability

Direct equality on numeric vectors can be brittle across platforms. Add a tiny tolerance.
-expect_equal(summarized$weights, single_protein$weights,
-             info = "Weights from model should match input weights")
+expect_equal(summarized$weights, single_protein$weights, tolerance = 1e-12,
+             info = "Weights from model should match input weights")
41-47: Keying into coefficients by name is brittle

If internal names change, this will break. Optionally guard with expect_true("FEATUREASGLLLER_2_y3_1" %in% names(summarized$coefficients)) before comparing, or retrieve by matching the feature column.
+expect_true("FEATUREASGLLLER_2_y3_1" %in% names(summarized$coefficients))
R/groupComparison.R (1)
97-101: Tighten log/message phrasing

“Start to test and get inference in whole plot” is awkward. Shorten and unify LOG/MSG.
-    getOption("MSstatsLog")(
-        "INFO", "== Start to test and get inference in whole plot")
-    getOption("MSstatsMsg")(
-        "INFO", " == Start to test and get inference in whole plot ...")
+    getOption("MSstatsLog")("INFO", "== Starting group comparison on all proteins")
+    getOption("MSstatsMsg")("INFO", " == Starting group comparison on all proteins ...")
inst/tinytest/test_groupComparison.R (2)
49-54: Use a small tolerance for numeric equality

Floating-point quirks can cause spurious failures across platforms.
-expect_equal(no_anomaly_Q9UFW8$SE, with_anomaly_Q9UFW8$SE, 
+expect_equal(no_anomaly_Q9UFW8$SE, with_anomaly_Q9UFW8$SE, tolerance = 1e-12, 
              info = "SE values for Q9UFW8 should be the same")
-expect_equal(no_anomaly_Q9UFW8$pvalue, with_anomaly_Q9UFW8$pvalue,
+expect_equal(no_anomaly_Q9UFW8$pvalue, with_anomaly_Q9UFW8$pvalue, tolerance = 1e-12,
              info = "pvalue values for Q9UFW8 should be the same")
59-62: Inequality assertions are good; optionally add tolerance bands

If borderline, consider asserting a minimum relative/absolute difference to avoid flakes.
inst/tinytest/test_dataProcess.R (1)
29-48: Consider setting a seed before imputation

If AFT or related steps introduce any stochasticity, set a seed for stability.
+# Ensure deterministic behavior if any stochastic steps are used
+set.seed(42)
 msstats_input = data.table::data.table(
src/linear_summary.cpp (1)
193-194: Numerical safety: clamp tiny negative variances to 0

Floating-point noise can yield slightly negative values. Clamp to non-negative.
-        variances[run_id] = get_quant_var(contrast, cov_mat);
+        variances[run_id] = std::max(0.0, get_quant_var(contrast, cov_mat));
-        variances.push_back(get_quant_var(contrast, cov_mat));
+        variances.push_back(std::max(0.0, get_quant_var(contrast, cov_mat)));
Also applies to: 203-204
vignettes/MSstatsPlus.Rmd (4)
116-123: Standardize column names or use named args for clarity

The example uses sanitized column names (FGShapeQualityScore(MS2)) while earlier text uses raw Spectronaut headers with spaces/dots. Clarify that the converter standardizes names, or standardize explicitly to avoid user confusion.
 msstats_input = MSstatsConvert::SpectronauttoMSstatsFormat(
     spectronaut_raw, annotation, intensity = 'PeakArea',
     excludedFromQuantificationFilter = TRUE, # Pre-filtering
     calculateAnomalyScores=TRUE, # Key parameter to use MSstats+
-    anomalyModelFeatures=c("FGShapeQualityScore(MS2)",
-                           "FGShapeQualityScore(MS1)",
-                           "EGDeltaRT"), # Quality metrics
+    # Use standardized column names (no dots/spaces) or call name standardizer
+    anomalyModelFeatures=c("FGShapeQualityScore(MS2)",
+                           "FGShapeQualityScore(MS1)",
+                           "EGDeltaRT"), # Quality metrics (standardized)
And add a short note right below explaining that MSstatsConvert standardizes feature names (dots/spaces removed), or show how to map raw headers to standardized ones.

347-362: Fix axis labels in the fold-change plot

Labels don’t match aesthetics (x=Protein, y=abs(log2FC)).
 ggplot(data=merged_results) + 
     geom_col(aes(x=Protein, y=abs(log2FC), fill=Model), position="dodge") +
     geom_hline(yintercept=1, linetype="dashed", color="black", linewidth=1) +
     theme_bw(base_size =16) + 
     labs(title="Log fold change comparison",
-         x="MSstats+",
-         y="Base MSstats")
+         x="Protein",
+         y="|log2FC|")
367-373: Fix axis labels in the SE plot

Same issue as above.
 ggplot(data=merged_results) + 
     geom_col(aes(x=Protein, y=SE, fill=Model), position="dodge") +
     theme_bw(base_size =16) + 
     labs(title="Standard error comparison",
-         x="MSstats+",
-         y="Base MSstats")
+         x="Protein",
+         y="SE")
260-265: Use named arguments for MSstatsAnomalyScores example

Unlabeled positional args reduce readability and risk misordering.
 input = MSstatsConvert::MSstatsAnomalyScores(
-    msstats_input, 
-    c("FG.ShapeQualityScore (MS1)", "FG.ShapeQualityScore (MS2)", "EG.DeltaRT"), 
-    c("mean_decrease", "mean_decrease", "dispersion_increase"),
-    .5, 100, run_order, 100, "auto", 1)
+    data = msstats_input, 
+    features = c("FG.ShapeQualityScore (MS1)", "FG.ShapeQualityScore (MS2)", "EG.DeltaRT"), 
+    featureTemporal = c("mean_decrease", "mean_decrease", "dispersion_increase"),
+    trainQuantile = 0.5, sampleSize = 100, runOrder = run_order,
+    n_trees = 100, max_depth = "auto", numberOfCores = 1)
R/utils_groupcomparison.R (1)
466-478: Validate has_imputed attribute presence and type
After retrieving has_imputed <- attr(summarized_list, "has_imputed"), add a check to ensure it exists and is a length-one logical, for example:
if (!is.logical(has_imputed) || length(has_imputed) != 1) {
  stop("`summarized_list` must have a single logical `has_imputed` attribute")
}
This prevents unexpected NULL or non-logical values from causing runtime errors.
R/dataProcess.R (1)
128-129: Document the default value choice for aft_iterations

The default value of 90 iterations for AFT model fitting seems arbitrary. Consider documenting why this specific value was chosen or make it configurable based on convergence criteria.

Consider adding a comment explaining the rationale:
-    numberOfCores = 1, aft_iterations=90
+    numberOfCores = 1, aft_iterations=90  # Default based on empirical testing for typical proteomics datasets

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8479d28 and dcdfd78.

📒 Files selected for processing (26)

R/RcppExports.R (1 hunks)
R/dataProcess.R (15 hunks)
R/groupComparison.R (3 hunks)
R/utils_anomaly_weights.R (1 hunks)
R/utils_censored.R (1 hunks)
R/utils_checks.R (5 hunks)
R/utils_groupcomparison.R (4 hunks)
R/utils_imputation.R (2 hunks)
R/utils_output.R (2 hunks)
R/utils_summarization.R (1 hunks)
R/utils_summarization_prepare.R (2 hunks)
inst/tinytest/test_dataProcess.R (1 hunks)
inst/tinytest/test_groupComparison.R (1 hunks)
inst/tinytest/test_utils_groupComparison_model.R (1 hunks)
inst/tinytest/test_utils_summarization.R (1 hunks)
man/MSstatsSummarizationOutput.Rd (1 hunks)
man/MSstatsSummarize.Rd (1 hunks)
man/MSstatsSummarizeSingleLinear.Rd (2 hunks)
man/MSstatsSummarizeSingleTMP.Rd (2 hunks)
man/MSstatsSummarizeWithMultipleCores.Rd (1 hunks)
man/MSstatsSummarizeWithSingleCore.Rd (2 hunks)
man/dataProcess.Rd (3 hunks)
man/dot-checkDataProcessParams.Rd (2 hunks)
src/RcppExports.cpp (2 hunks)
src/linear_summary.cpp (4 hunks)
vignettes/MSstatsPlus.Rmd (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

R/RcppExports.R (1)

src/comparison_contrasts.cpp (1)

get_estimable_fixed_random (6-27)

inst/tinytest/test_groupComparison.R (1)

src/comparison_contrasts.cpp (1)

get_estimable_fixed_random (6-27)

src/RcppExports.cpp (1)

src/linear_summary.cpp (2)

get_linear_summary (169-210)

get_linear_summary (169-173)

src/linear_summary.cpp (1)

src/RcppExports.cpp (10)

contrast (21-21)

cov_mat (52-52)

input (32-32)

input (48-48)

coefs (33-33)

coefs (49-49)

counts (35-35)

counts (50-50)

is_labeled (36-36)

is_labeled (51-51)

🔇 Additional comments (31)

R/utils_summarization.R (1)

167-172: LGTM: add weights to non-labeled linear models.

Passing weights into the non-labeled single- and multi-feature lm() calls aligns with the PR’s goal.

R/utils_imputation.R (1)

20-49: LGTM: consistently using aft_iterations for survreg control.

Replacing hard-coded maxiter with aft_iterations across labeled/unlabeled paths is correct.

man/dot-checkDataProcessParams.Rd (1)

7-15: Approve docs update: R/utils_checks.R includes @param input_columns matching the function’s signature; re-run roxygen to regenerate the updated manual page.

R/utils_checks.R (2)

99-105: Good guard: disallow AnomalyScores with non-linear summary.

This check is appropriate and prevents silent misuse.

91-94: All call sites include the new input_columns parameter
Verified that R/dataProcess.R passes colnames(raw) as the seventh argument and the man page (man/dot-checkDataProcessParams.Rd) documents the updated signature.

man/MSstatsSummarizationOutput.Rd (1)

56-56: Example updated to 7-arg MSstatsSummarizeWithSingleCore call—LGTM.

inst/tinytest/test_utils_groupComparison_model.R (1)

1-19: Good: verifies inverse-variance weights are plumbed to full_fit.

R/utils_output.R (1)

35-35: Example updated to pass aft_iterations—LGTM.

man/MSstatsSummarizeWithSingleCore.Rd (1)

62-63: Example update to 7 args—LGTM.

man/MSstatsSummarizeSingleLinear.Rd (1)

41-41: Example aligns with new signature—LGTM.

R/groupComparison.R (3)

101-104: LGTM: purely formatting for readability

Argument wrapping improves readability; no functional change.

175-186: LGTM: early return for parallel path

The guard and formatting are fine.

275-287: LGTM: clearer multi-line call

Pure formatting. No functional concerns.

inst/tinytest/test_groupComparison.R (2)

19-21: Row-count parity assertion is good

Validates multi-core equivalence at the surface level.

22-36: Weighted vs unweighted fixtures look reasonable

Variance pattern should exercise weighting differences for Q96S19 and sameness for Q9UFW8.

inst/tinytest/test_dataProcess.R (8)

27-28: LGTM: basic sanity on tech reps/fractions

Good minimal check that FeatureLevelData is populated.

51-55: Error pattern check is appropriate

Pattern is specific enough; no change needed.

64-74: Good validations on Variance column

Covers presence, positivity, finiteness, and per-protein constancy (for Q9UFW8).

96-104: Comparative variance test is meaningful

Captures expected behavior shift with lower anomaly scores.

106-115: Absence-of-anomaly control is sound

Identity check guards against accidental reuse of the same variance vector.

149-156: Single-protein linear summarization: good coverage

Parameters (including aft_iterations and equal_variances) are exercised; shapes validated.

171-189: Variance stratification assertions are strong

Clear expectation that runs 4–5 have substantially higher variance.

190-210: Proximity check to feature-level abundances is a nice consistency test

Solid behavioral validation of weighted aggregation.

src/linear_summary.cpp (2)

106-112: Variance via cᵀΣc looks correct

Computation is standard and efficient with Armadillo. Nice addition.

206-209: Return schema LGTM

Two-column DataFrame with "LogIntensities" and "Variance" is clear and minimal.

src/RcppExports.cpp (1)

43-56: Wrapper update and registration look correct

Signature, argument forwarding, and CallEntries (arity=5) are consistent.

Please confirm R/RcppExports.R was re-generated and committed so the .Call arity matches (5 args). If helpful, I can generate a quick check script.

Also applies to: 74-74

R/utils_groupcomparison.R (1)

150-155: LGTM! Consistent weight propagation across model fitting paths

The implementation correctly applies variance-based weights to all linear and mixed-effects models across different experimental designs (not repeated, repeated, time-course). This ensures consistent weighting throughout the group comparison workflow.

R/dataProcess.R (4)

545-553: Good convergence handling implementation

The implementation properly handles AFT model convergence warnings by catching them and setting predicted values to NA when convergence fails. This prevents silent failures and ensures data integrity.

486-488: Resolved: get_linear_summary signature matches C++ implementation
Verified that the R call get_linear_summary(single_protein, cf, counts, label, cov_mat) aligns exactly with the C++ definition in src/linear_summary.cpp and its Rcpp export in src/RcppExports.cpp. No further action needed.

427-429: Ignore n_obs column existence concern

The columns n_obs and n_obs_run are generated in MSstatsPrepareForSummarization (via .prepareLinear/.prepareTMP in R/utils_summarization_prepare.R) before downstream filtering, so the runtime filter on those fields in dataProcess.R is valid.

Likely an incorrect or invalid review comment.

555-559: Ignore converged flag scope concern
The local variable converged is defined in MSstatsSummarizeSingleTMP and <<- within the warning handler correctly updates that same variable. No scoping issue remains.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-09-09T14:10:40Z

inst/tinytest/test_groupComparison.R

+protein_level_data_no_anomalies = protein_level_data_with_anomalies
+protein_level_data_no_anomalies$Variance = NULL
+QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)
+QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)
+model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,
+                        use_log_file = FALSE)
+model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,
+                         use_log_file = FALSE)
+


💡 Verification agent

🧩 Analysis chain

Ensure minimal input structure matches groupComparison expectations

groupComparison() calls MSstatsPrepareForGroupComparison(), which references FeatureLevelData$LABEL. Provide a stub FeatureLevelData to avoid NULL access in minimal tests and use named arguments.

-protein_level_data_no_anomalies$Variance = NULL -QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies) -QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies) -model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly, - use_log_file = FALSE) -model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly, - use_log_file = FALSE) +protein_level_data_no_anomalies$Variance = NULL +stub_feature = data.table::data.table(LABEL = factor("L")) +QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies, + FeatureLevelData = stub_feature) +QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies, + FeatureLevelData = stub_feature) +model_no_anomaly = groupComparison(contrast.matrix = "pairwise", + data = QuantDataNoAnomaly, + use_log_file = FALSE) +model_with_anomaly = groupComparison(contrast.matrix = "pairwise", + data = QuantDataWithAnomaly, + use_log_file = FALSE)

🏁 Script executed:

#!/bin/bash # Locate the definition of groupComparison and its signature rg -nP 'groupComparison\s*<-' -g '*.R' # Locate MSstatsPrepareForGroupComparison and check for FeatureLevelData$LABEL usage rg -n 'MSstatsPrepareForGroupComparison' -g '*.R' -C5

Length of output: 6387

🏁 Script executed:

#!/bin/bash # Show lines around groupComparison usage of FeatureLevelData sed -n '80,110p' R/groupComparison.R # Show lines around MSstatsPrepareForGroupComparison and LABEL usage sed -n '120,150p' R/groupComparison.R

Length of output: 3049

Ensure tests supply a minimal FeatureLevelData stub
Add a stub FeatureLevelData with a LABEL column and include it in your QuantData lists; switch to named args for groupComparison():

-protein_level_data_no_anomalies$Variance = NULL -QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies) -QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies) -model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly, - use_log_file = FALSE) -model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly, - use_log_file = FALSE) +protein_level_data_no_anomalies$Variance = NULL +stub_feature = data.table::data.table(LABEL = factor("L")) +QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies, + FeatureLevelData = stub_feature) +QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies, + FeatureLevelData = stub_feature) +model_no_anomaly = groupComparison(contrast.matrix = "pairwise", + data = QuantDataNoAnomaly, + use_log_file = FALSE) +model_with_anomaly = groupComparison(contrast.matrix = "pairwise", + data = QuantDataWithAnomaly, + use_log_file = FALSE)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

protein_level_data_no_anomalies = protein_level_data_with_anomalies

protein_level_data_no_anomalies$Variance = NULL

QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)

QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)

model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,

use_log_file = FALSE)

model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,

use_log_file = FALSE)

protein_level_data_no_anomalies = protein_level_data_with_anomalies

protein_level_data_no_anomalies$Variance = NULL

# Minimal FeatureLevelData stub required by groupComparison()

stub_feature = data.table::data.table(LABEL = factor("L"))

QuantDataNoAnomaly = list(

ProteinLevelData = protein_level_data_no_anomalies,

FeatureLevelData = stub_feature

)

QuantDataWithAnomaly = list(

ProteinLevelData = protein_level_data_with_anomalies,

FeatureLevelData = stub_feature

)

model_no_anomaly = groupComparison(

contrast.matrix = "pairwise",

data = QuantDataNoAnomaly,

use_log_file = FALSE

)

model_with_anomaly = groupComparison(

contrast.matrix = "pairwise",

data = QuantDataWithAnomaly,

use_log_file = FALSE

)

🤖 Prompt for AI Agents

In inst/tinytest/test_groupComparison.R around lines 37 to 45, the tests call groupComparison with QuantData lists that lack a minimal FeatureLevelData stub and use positional args; add a FeatureLevelData element to both QuantDataNoAnomaly and QuantDataWithAnomaly containing at least a LABEL column (matching the peptides/features in ProteinLevelData), include it in the lists, and call groupComparison with named arguments (e.g., data = QuantDataNoAnomaly, design = "pairwise", use_log_file = FALSE) so the function receives the required FeatureLevelData and parameters explicitly.

coderabbitai · 2025-09-09T14:10:40Z

man/dataProcess.Rd

+  numberOfCores = 1,
+  aft_iterations = 90
 )


💡 Verification agent

🧩 Analysis chain

API/docs alignment: aft_iterations added; clarify scope and enforce anomaly→linear rule.

Ensure all public wrappers default aft_iterations = 90 consistently.

Explicitly state whether aft_iterations affects TMP only (AFT imputation) or also linear paths.

The text “if anomaly detection … ‘linear’ must be used” should be enforced in code (e.g., parameter validation) if not already.

To confirm consistency of defaults across the repo:

Also applies to: 71-73, 118-119

🏁 Script executed:

#!/bin/bash rg -nP '\baft_iterations\s*=\s*90\b' -C2

Length of output: 3445

Clarify aft_iterations scope and enforce anomaly→linear rule

Docs (e.g. man/dataProcess.Rd and related *.Rd) must state that aft_iterations only affects AFT/TMP imputation and has no effect on linear summarization.

In each MSstatsSummarize… function, add a parameter check to error if impute="anomaly" is used without method="linear".

coderabbitai · 2025-09-09T14:10:40Z

man/MSstatsSummarizationOutput.Rd

 output = output = MSstatsSummarizationOutput(input, summarized, processed,
 method, impute, cens)


⚠️ Potential issue

Fix duplicated assignment in example ("output = output =").

Change to a single assignment to avoid confusing users.

🤖 Prompt for AI Agents

In man/MSstatsSummarizationOutput.Rd around lines 57 to 58, there is a duplicated assignment in the example ("output = output = ..."); update this to a single assignment by removing the extra "output =" so the example shows a single, correct assignment of the function result to the variable.

coderabbitai · 2025-09-09T14:10:40Z

man/MSstatsSummarize.Rd

  impute,
  censored_symbol,
  remove50missing,
+  aft_iterations,


⚠️ Potential issue

Docs incomplete: aft_iterations lacks an argument entry and examples use the old signature.

Add an \item for aft_iterations and update examples to include it. Also fix typos in the imputation description.

\usage{ MSstatsSummarize( proteins_list, method, impute, censored_symbol, remove50missing, - aft_iterations, + aft_iterations, equal_variance ) } \arguments{ \item{proteins_list}{list of processed feature-level data} @@ -\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. -TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. +\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. +TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelerated failure time (AFT) model. FALSE uses the values assigned by cutoffCensored} @@ -Null assumes that all NA intensites are randomly missing.} +Null assumes that all NA intensities are randomly missing.} + +\item{aft_iterations}{integer; maximum iterations for AFT-based imputation (only for summaryMethod = "TMP"). Typical default is 90.} @@ -summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, TRUE) +summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, 90, TRUE)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

aft_iterations,

\usage{

MSstatsSummarize(

proteins_list,

method,

impute,

censored_symbol,

remove50missing,

aft_iterations,

equal_variance

)

}

\arguments{

\item{proteins_list}{list of processed feature-level data}

\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'.

TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelerated failure time (AFT) model.

FALSE uses the values assigned by cutoffCensored}

\item{aft_iterations}{integer; maximum iterations for AFT-based imputation (only for summaryMethod = "TMP"). Typical default is 90.}

}

\examples{

summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, 90, TRUE)

}

🤖 Prompt for AI Agents

In man/MSstatsSummarize.Rd around line 13, the documentation is missing an \item entry for the aft_iterations argument and the examples still use the old function signature; update the Rd file by adding a properly formatted \item{aft_iterations}{<brief description>} entry describing its purpose and default behavior, revise the Examples section to call MSstatsSummarize with the aft_iterations parameter included, and correct typographical errors in the imputation description (fix spelling and grammar) so the description reads clearly and accurately.

coderabbitai · 2025-09-09T14:10:40Z

man/MSstatsSummarizeSingleLinear.Rd

+MSstatsSummarizeSingleLinear(
+  single_protein,
+  impute,
+  censored_symbol,
+  remove50missing,
+  aft_iterations = 90,
+  equal_variances = TRUE
+)


⚠️ Potential issue

Arguments section incomplete for expanded signature.

Document impute, censored_symbol, remove50missing, aft_iterations, and equal_variances to match usage.

🤖 Prompt for AI Agents

In man/MSstatsSummarizeSingleLinear.Rd around lines 7–14, the Arguments section doesn’t document the expanded signature parameters impute, censored_symbol, remove50missing, aft_iterations, and equal_variances; add individual Rd entries for each parameter including its name, expected type, default value (impute — logical or method, censored_symbol — character scalar, remove50missing — logical, aft_iterations — integer default 90, equal_variances — logical default TRUE) and a one-sentence description of what it controls and its effect on the function (e.g., impute: whether/how missing values are imputed; censored_symbol: symbol representing censored observations; remove50missing: whether to drop features with ≥50% missing; aft_iterations: number of AFT EM iterations; equal_variances: whether to assume equal variances across groups), keeping wording consistent with other argument entries and updating any \usage or \examples sections if they reference the expanded signature.

coderabbitai · 2025-09-09T14:10:42Z

R/utils_summarization_prepare.R

+    # if (method == "TMP") {
+    input = .prepareTMP(input, impute, censored_symbol)
+    # } else {
+    #     input = .prepareLinear(input, FALSE, censored_symbol)
+    # }
    input


⚠️ Potential issue

Bug: method is ignored; linear path never used.

.prepareSummary always routes to TMP, contradicting linear-weighting work and the AnomalyScores guard in utils_checks.R. This will alter behavior for linear summarization.

Restore dispatch:

-# if (method == "TMP") { -input = .prepareTMP(input, impute, censored_symbol) -# } else { -# input = .prepareLinear(input, FALSE, censored_symbol) -# } +if (identical(method, "TMP")) { + input = .prepareTMP(input, impute, censored_symbol) +} else { + input = .prepareLinear(input, FALSE, censored_symbol) +}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# if (method == "TMP") {

input = .prepareTMP(input, impute, censored_symbol)

# } else {

# input = .prepareLinear(input, FALSE, censored_symbol)

# }

input

if (identical(method, "TMP")) {

input = .prepareTMP(input, impute, censored_symbol)

} else {

input = .prepareLinear(input, FALSE, censored_symbol)

}

input

🤖 Prompt for AI Agents

In R/utils_summarization_prepare.R around lines 110 to 115, the code unconditionally calls .prepareTMP and ignores the method parameter so the linear path is never used; restore proper dispatch by checking the method value (e.g., if (method == "TMP") { input = .prepareTMP(input, impute, censored_symbol) } else { input = .prepareLinear(input, FALSE, censored_symbol) }) and ensure the function returns input; also add a defensive branch or error for unknown/unsupported method values to match the AnomalyScores guard in utils_checks.R.

coderabbitai · 2025-09-09T14:10:42Z

R/utils_summarization.R

+    if (all(!is.na(input$weights))){
+        weight_input = input$weights
+    } else {
+        weight_input = NULL
+    }
+


🛠️ Refactor suggestion

Guard weights: require non-missing, finite, and non-negative before using in lm().

Avoid silent acceptance of NULL/invalid weights (e.g., negative or Inf) which will error or skew fits.

Apply:

- if (all(!is.na(input$weights))){ - weight_input = input$weights - } else { - weight_input = NULL - } + if (!is.null(input$weights) && + all(is.finite(input$weights)) && + all(input$weights >= 0) && + all(!is.na(input$weights))) { + weight_input = input$weights + } else { + weight_input = NULL + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (all(!is.na(input$weights))){

weight_input = input$weights

} else {

weight_input = NULL

}

if (!is.null(input$weights) &&

all(is.finite(input$weights)) &&

all(input$weights >= 0) &&

all(!is.na(input$weights))) {

weight_input = input$weights

} else {

weight_input = NULL

}

🤖 Prompt for AI Agents

In R/utils_summarization.R around lines 159 to 164, the current guard only checks for non-NA weights and may pass invalid values (NA, NaN, Inf, negative) to lm(); update the validation to ensure input$weights is numeric and contains only finite, non-missing, and non-negative values before assigning weight_input, otherwise set weight_input <- NULL; specifically reject or coerce non-numeric inputs, check all(is.finite(weights)) and all(weights >= 0), and only then use them in lm(), otherwise fall back to NULL to avoid errors or skewed fits.

coderabbitai · 2025-09-09T14:10:42Z

src/linear_summary.cpp

 // [[Rcpp::export]]
-NumericVector get_linear_summary(const DataFrame& input,
+DataFrame get_linear_summary(const DataFrame& input,
                                 const NumericVector& coefs,
                                 const NumericMatrix& counts,
-                                 const bool is_labeled) {
+                                 const bool is_labeled,
+                                 const NumericMatrix& cov_mat) {
    CharacterVector runs = input["RUN"];


🛠️ Refactor suggestion

Validate cov_mat shape and alignment with coefs

Add defensive checks to fail fast on mismatched dimensions and non-square inputs.

DataFrame get_linear_summary(const DataFrame& input, const NumericVector& coefs, const NumericMatrix& counts, const bool is_labeled, const NumericMatrix& cov_mat) { + // Basic validation + if (cov_mat.nrow() != cov_mat.ncol()) { + stop("cov_mat must be a square matrix (n x n)."); + } + if (cov_mat.nrow() != coefs.size()) { + stop("Dimension mismatch: nrow(cov_mat) must equal length(coefs)."); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// [[Rcpp::export]]

NumericVector get_linear_summary(const DataFrame& input,

DataFrame get_linear_summary(const DataFrame& input,

const NumericVector& coefs,

const NumericMatrix& counts,

const bool is_labeled) {

const bool is_labeled,

const NumericMatrix& cov_mat) {

CharacterVector runs = input["RUN"];

// [[Rcpp::export]]

DataFrame get_linear_summary(const DataFrame& input,

const NumericVector& coefs,

const NumericMatrix& counts,

const bool is_labeled,

const NumericMatrix& cov_mat) {

// Basic validation

if (cov_mat.nrow() != cov_mat.ncol()) {

stop("cov_mat must be a square matrix (n x n).");

}

if (cov_mat.nrow() != coefs.size()) {

stop("Dimension mismatch: nrow(cov_mat) must equal length(coefs).");

}

CharacterVector runs = input["RUN"];

// ... rest of function ...

}

🤖 Prompt for AI Agents

In src/linear_summary.cpp around lines 168 to 174, add defensive validation to ensure cov_mat is square and its dimensions match the number of coefficients: check cov_mat.nrow() == cov_mat.ncol() and cov_mat.nrow() == coefs.size(); if either check fails call Rcpp::stop with a clear message (e.g. "cov_mat must be square and size must equal length(coefs)"); perform these checks early (immediately after obtaining inputs) so the function fails fast on dimension mismatches.

coderabbitai · 2025-09-09T14:10:42Z

vignettes/MSstatsPlus.Rmd

+   metric.9 Possible values are `mean_decrease`, `mean_increase`, and 
+   `dispersion_increase`. These indicate the direction a metric is changes which
+   indicates a drop in quality (i.e., decrease, increase, or can go in either
+   direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a 
+   deviation in expected retention can be either higher or lower.
+- `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The 


⚠️ Potential issue

Fix typo and grammar in parameter description

Stray “9” and wording nit in anomalyModelFeatureTemporal.

- metric.9 Possible values are `mean_decrease`, `mean_increase`, and - `dispersion_increase`. These indicate the direction a metric is changes which + metric. Possible values are `mean_decrease`, `mean_increase`, and + `dispersion_increase`. These indicate the direction a metric changes which indicates a drop in quality (i.e., decrease, increase, or can go in either

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

metric.9 Possible values are `mean_decrease`, `mean_increase`, and

`dispersion_increase`. These indicate the direction a metric is changes which

indicates a drop in quality (i.e., decrease, increase, or can go in either

direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a

deviation in expected retention can be either higher or lower.

- `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The

metric Possible values are `mean_decrease`, `mean_increase`, and

`dispersion_increase`. These indicate the direction a metric changes which

indicates a drop in quality (i.e., decrease, increase, or can go in either

direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a

deviation in expected retention can be either higher or lower.

- `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The

🤖 Prompt for AI Agents

In vignettes/MSstatsPlus.Rmd around lines 141 to 146, the parameter description for anomalyModelFeatureTemporal contains a stray "9" after "metric." and awkward wording; remove the stray "9" and rephrase the sentence for clarity (e.g., "Possible values are `mean_decrease`, `mean_increase`, and `dispersion_increase`. These indicate the direction a metric changes that corresponds to a drop in quality..."), and fix the surrounding grammar/wording (e.g., "indicates the direction a metric changes" -> "indicate the direction a metric changes") so the paragraph reads cleanly and flows correctly.

coderabbitai · 2025-09-09T14:10:43Z

vignettes/MSstatsPlus.Rmd

+                       "FG.ShapeQualityScore (MS1)",
+                       "EG.ApexRT")
+anomalyModelFeatures = MSstatsConvert:::.standardizeColnames(anomalyModelFeatures)
+


🛠️ Refactor suggestion

Avoid triple-colon usage in vignette

MSstatsConvert:::.standardizeColnames is internal; vignettes should not rely on internals. Show a documented alternative or inline a simple sanitizer.

-anomalyModelFeatures = MSstatsConvert:::.standardizeColnames(anomalyModelFeatures) +# Example: simple standardization without relying on internals +anomalyModelFeatures = gsub("\\s+", "", anomalyModelFeatures) # drop spaces +anomalyModelFeatures = gsub("\\.", "", anomalyModelFeatures) # drop dots +anomalyModelFeatures = gsub("[^A-Za-z0-9()]", "", anomalyModelFeatures) # keep alnum/() +# Or, if MSstatsConvert exposes a public helper, prefer that here.

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

R/dataProcess.R (1)

220-225: Export aft_iterations to workers in PSOCK clusters

aft_iterations is used inside parLapply closures but not exported to the cluster; this may cause “object 'aft_iterations' not found” on some setups.

Apply:

 parallel::clusterExport(cl, c("MSstatsSummarizeSingleTMP", 
                               "MSstatsSummarizeSingleLinear",
                               "input", "impute", "censored_symbol",
-                              "remove50missing", "protein_indices", 
-                              "equal_variance"), 
+                              "remove50missing", "protein_indices", 
+                              "equal_variance", "aft_iterations"), 
                               envir = function_environment)

♻️ Duplicate comments (4)

R/utils_output.R (1)

111-115: Bug: .finalizeInput always routes through TMP; linear summarization path is dead.
Restoring method-based branching is necessary; otherwise linear runs get TMP-specific fields and imputation logic.

Apply:

-    # if (method == "TMP") {
-    input = .finalizeTMP(input, censored_symbol, impute, summarized)
-    # } else {
-    #     input = .finalizeLinear(input, censored_symbol)
-    # }
+    if (identical(method, "TMP")) {
+        input = .finalizeTMP(input, censored_symbol, impute, summarized)
+    } else {
+        input = .finalizeLinear(input, censored_symbol)
+    }

R/dataProcess.R (2)

375-393: Add robust error handling for survival fit and predict in linear path

Same concern as previously raised: failures in .fitSurvival/predict can silently produce NA and derail downstream steps. Add tryCatch and logging; also compute imputation_var only on success.

Apply:

-    if (impute & any(single_protein[["censored"]])) {
-        survival_fit = .fitSurvival(single_protein[LABEL == "L", cols,
-                                                   with = FALSE],
-                                    aft_iterations)
-        sigma2 = survival_fit$scale^2
-        single_protein[, c("predicted", "imputation_var") := {
-            pred = predict(survival_fit, newdata = .SD, se.fit = TRUE)
-            list(pred$fit, pred$se.fit^2 + sigma2)
-        }]
+    if (impute && any(single_protein[["censored"]])) {
+        survival_fit <- tryCatch(
+            .fitSurvival(single_protein[LABEL == "L", cols, with = FALSE],
+                         aft_iterations),
+            error = function(e) {
+                msg <- paste("*** warning: Survival model fitting failed for protein",
+                             unique(single_protein$PROTEIN), ":", conditionMessage(e))
+                getOption("MSstatsLog")("WARN", msg)
+                NULL
+            }
+        )
+        if (!is.null(survival_fit)) {
+            sigma2 <- survival_fit$scale^2
+            single_protein[, c("predicted", "imputation_var") := {
+                pr <- tryCatch(
+                    predict(survival_fit, newdata = .SD, se.fit = TRUE),
+                    error = function(e) {
+                        msg <- paste("*** warning: Survival predict failed for protein",
+                                     unique(single_protein$PROTEIN), ":", conditionMessage(e))
+                        getOption("MSstatsLog")("WARN", msg)
+                        list(fit = rep(NA_real_, .N), se.fit = rep(NA_real_, .N))
+                    }
+                )
+                list(pr$fit, ifelse(is.na(pr$se.fit), NA_real_, pr$se.fit^2 + sigma2))
+            }]
+        } else {
+            single_protein[, c("predicted", "imputation_var") := list(NA_real_, NA_real_)]
+        }

394-399: Validate ANOMALYSCORES before reciprocal; guard missing/invalid values

Prevent division by zero/negatives/Inf and missing column cases; log and fall back to NA weights. Same as earlier suggestion.

Apply:

-    if (all(!is.na(single_protein$ANOMALYSCORES))){
-        single_protein$weights = 1 / single_protein$ANOMALYSCORES
-    } else {
-        single_protein$weights = NA
-    }
+    if ("ANOMALYSCORES" %in% colnames(single_protein) &&
+        all(is.finite(single_protein$ANOMALYSCORES))) {
+        if (any(single_protein$ANOMALYSCORES <= 0)) {
+            msg <- paste("*** warning: Invalid (<=0) anomaly scores for protein",
+                         unique(single_protein$PROTEIN), "- weights not applied")
+            getOption("MSstatsLog")("WARN", msg)
+            single_protein$weights <- NA_real_
+        } else {
+            single_protein$weights <- 1 / single_protein$ANOMALYSCORES
+        }
+    } else {
+        single_protein$weights <- NA_real_
+    }

R/utils_groupcomparison.R (1)

139-147: Validate Variance before computing weights (duplicate of prior ask)

Guard against zero/negative/Inf variances to avoid invalid weights and model errors.

Apply:

-    if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){
-        weight_input = 1/input$Variance
+    if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){
+        if (any(input$Variance <= 0 | is.infinite(input$Variance), na.rm = TRUE)) {
+            msg = paste("*** warning: Invalid variance values detected for protein",
+                        unique(input$Protein), "- weights will not be applied")
+            getOption("MSstatsLog")("WARN", msg)
+            weight_input = NULL
+        } else {
+            weight_input = 1/input$Variance
+        }
     } else {
         weight_input = NULL
     }

🧹 Nitpick comments (11)

R/utils_output.R (1)
33-33: Use a named argument for the new 7th parameter in the example.
Keeps the example resilient to future signature shifts and clearer for users.

Apply:
-#' summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE, 100)
+#' summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE, aft_iterations = 100)
Please confirm the exported name is indeed aft_iterations.
R/dataProcess.R (4)
36-37: Clarify requirement wording for anomaly detection + linear summarization

Suggest making it explicit that anomaly-derived weights are only supported with summaryMethod = "linear", and that TMP will ignore/anonymize anomaly weights.

Apply:
-#' which is robust estimation method. "linear" uses linear mixed model. If 
-#' anomaly detection algorithm is performed, "linear" must be used.
+#' which is a robust estimation method. "linear" uses a (mixed) linear model.
+#' When anomaly detection is used (weights present), summaryMethod must be "linear";
+#' TMP does not apply anomaly-derived weights.
289-290: Examples align with new signatures

Examples compile with aft_iterations in place; consider adding one example that sets equal_variances = FALSE.

Also applies to: 355-356

394-399: Optional: normalize weights to mean 1 to stabilize optimization

If the downstream model uses observation-level weights, scaling weights to mean 1 can improve numerical stability without changing estimates.

Example:
w <- 1 / single_protein$ANOMALYSCORES
single_protein$weights <- w / mean(w, na.rm = TRUE)
333-334: R CMD check NOTE noise: declare globals

Consider adding LABEL, n_obs, n_obs_run, censored, predicted, imputation_var to the NULL globals list to suppress NOTES.
man/MSstatsSummarizeSingleLinear.Rd (2)
27-28: Grammar fix in argument description

“observation are” → “observations are”.

Apply in the roxygen source (R/dataProcess.R):
-#' @param equal_variances if TRUE, observation are assumed to be homoskedastic
+#' @param equal_variances if TRUE, observations are assumed to be homoskedastic
36-51: Docs match new signature; consider documenting weights behavior

Since anomaly-derived weights are now used in linear summarization, add a sentence in Description or Details noting how weights are interpreted (e.g., inverse anomaly scores, expected to be positive/finite).
R/utils_groupcomparison.R (4)
139-147: Optional: stabilize weights and avoid silent row drops

Extremely small variances can dominate fits; scale weights to mean=1 and cap extremes.

If any NA weights remain, lm/lmer silently drop rows; consider warning with counts and, if many are dropped, disabling weights.

Example:
-        } else {
-            weight_input = 1/input$Variance
+        } else {
+            w <- 1/input$Variance
+            invalid <- !is.finite(w) | w <= 0
+            if (any(invalid)) {
+                getOption("MSstatsLog")("WARN",
+                   paste("*** warning:", sum(invalid), "observations have invalid weights and will be dropped"))
+                w[invalid] <- NA_real_
+            }
+            # stabilize
+            w <- w / mean(w, na.rm = TRUE)
+            # optional cap
+            w <- pmin(w, quantile(w, 0.99, na.rm = TRUE))
+            weight_input <- w
         }
148-152: Use scalar OR (||) to avoid vectorized logic surprises

Use || for single boolean flags.
-        if (!has_tech_replicates | is_single_subject) {
+        if (!has_tech_replicates || is_single_subject) {
139-147: Potential mismatch between modeled data and downstream checks when weights drop rows

If NA/invalid weights cause row exclusion, .getAllComparisons() still uses the original input to assess empty conditions and build contrasts. Consider passing the actual model frame rows (or a filtered input aligned to weights) into .getAllComparisons() to avoid inconsistencies.

Would you like me to draft a minimal change that derives the model frame with model.frame(full_fit) and threads it through to .getAllComparisons()?

Also applies to: 280-296

466-479: Parallel workers: pre-load namespaces for stability and perf

Exporting functions works, but pre-loading packages on workers avoids first-call overhead and surprises with S3/S4 methods.
   cl = parallel::makeCluster(numberOfCores)
+  parallel::clusterEvalQ(cl, {
+    library(lme4)
+    library(data.table)
+    # library(MSstats) # optional if running from installed package
+    NULL
+  })
   parallel::clusterExport(cl, c("MSstatsGroupComparisonSingleProtein", 
                                  "contrast_matrix", "repeated", 
                                  "groups", "samples_info", 
                                  "save_fitted_models", "has_imputed"), 
                            envir = function_environment)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcdfd78 and beecaae.

📒 Files selected for processing (12)

R/dataProcess.R (15 hunks)
R/groupComparison.R (3 hunks)
R/utils_censored.R (1 hunks)
R/utils_groupcomparison.R (4 hunks)
R/utils_output.R (3 hunks)
man/MSstatsSummarizationOutput.Rd (2 hunks)
man/MSstatsSummarizeSingleLinear.Rd (2 hunks)
man/MSstatsSummarizeSingleTMP.Rd (3 hunks)
man/MSstatsSummarizeWithMultipleCores.Rd (2 hunks)
man/MSstatsSummarizeWithSingleCore.Rd (3 hunks)
man/dataProcess.Rd (3 hunks)
src/linear_summary.cpp (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (8)

R/utils_censored.R
man/MSstatsSummarizeWithMultipleCores.Rd
man/dataProcess.Rd
man/MSstatsSummarizationOutput.Rd
man/MSstatsSummarizeWithSingleCore.Rd
man/MSstatsSummarizeSingleTMP.Rd
R/groupComparison.R
src/linear_summary.cpp

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (10)

R/utils_output.R (1)

13-16: Docs return block is concise and accurate.
Matches the actual return structure of MSstatsSummarizationOutput.

R/dataProcess.R (6)

131-132: API change looks good; params propagated to validation

Adding aft_iterations to the signature and passing colnames(raw) to .checkDataProcessParams is consistent. No action needed.

Also applies to: 142-144

480-500: Nice: convergence handling for TMP imputation

Catches convergence warnings and sets predicted to NA; consistent with robust workflows.

209-212: Threading aft_iterations end-to-end looks correct

Signatures updated and forwarded through both parallel and single-core paths.

Also applies to: 256-261, 294-295

205-206: Note: ensure parallel imports cover parApply variants used

The @importFrom tag lists makeCluster/parLapply/stopCluster/clusterExport; OK as is.

422-427: Good: use vcov(fit) and pass cov_mat to get_linear_summary

This aligns with variance propagation into DA.

366-387: No changes needed for column naming consistency
The code defines the logical censored flag and then creates a numeric cen indicator (0 for censored, 1 otherwise) for use in the Surv model. Both names coexist by design and are correctly applied. (rdrr.io)

R/utils_groupcomparison.R (3)

89-93: LGTM: signature formatting only

No functional change; keeping params stable is good for API compatibility.

168-169: LGTM: weighted lm for single-subject time-course

Appropriate to thread weights through this branch.

505-506: LGTM: single-core signature formatting

No behavior change; keeps API aligned with multi-core path.

coderabbitai · 2025-09-09T17:10:47Z

R/dataProcess.R

+        survival = single_protein[, c(cols, "predicted"), with = FALSE]
+    } else {
+        survival = single_protein[, cols, with = FALSE]
+        survival[, predicted := NA]
+    }
+


🛠️ Refactor suggestion

Preserve and return imputation variance with survival output

imputation_var is computed but dropped from survival; include it for downstream DA that uses weights/variance.

Apply:

- survival = single_protein[, c(cols, "predicted"), with = FALSE] + survival = single_protein[, c(cols, "predicted", "imputation_var"), with = FALSE] @@ - survival = single_protein[, cols, with = FALSE] - survival[, predicted := NA] + survival = single_protein[, cols, with = FALSE] + survival[, `:=`(predicted = NA_real_, imputation_var = NA_real_)]

Also applies to: 429-432

🤖 Prompt for AI Agents

In R/dataProcess.R around lines 388-393, the imputation_var column is computed but not included in the survival data.frame; update both branches so survival includes imputation_var (e.g., select cols plus "predicted" and "imputation_var" in the first branch, and cols plus "imputation_var" in the else branch while still adding predicted := NA), and make the identical change at lines ~429-432 to ensure imputation variance is preserved and returned for downstream DA weighting/variance use.

coderabbitai · 2025-09-09T17:10:47Z

R/dataProcess.R

+    # fit = try(, silent = TRUE)
+    fit = .fitLinearModel(single_protein, is_single_feature, 
+                          is_labeled = label, equal_variances)

    if (inherits(fit, "try-error")) {
-        msg = paste("*** error : can't fit the model for ", unique(single_protein$PROTEIN))
+        msg = paste("*** error : can't fit the model for ", 
+                    unique(single_protein$PROTEIN))


⚠️ Potential issue

Restore try/tryCatch around .fitLinearModel; current error check is ineffective

fit is not wrapped in try, yet you test inherits(fit, "try-error"). A model failure will raise and skip the handler. Wrap the call.

Apply:

- # fit = try(, silent = TRUE) - fit = .fitLinearModel(single_protein, is_single_feature, - is_labeled = label, equal_variances) + fit <- try( + .fitLinearModel(single_protein, is_single_feature, + is_labeled = label, equal_variances), + silent = TRUE + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# fit = try(, silent = TRUE)

fit = .fitLinearModel(single_protein, is_single_feature,

is_labeled = label, equal_variances)

if (inherits(fit, "try-error")) {

msg = paste("*** error : can't fit the model for ", unique(single_protein$PROTEIN))

msg = paste("*** error : can't fit the model for ",

unique(single_protein$PROTEIN))

fit <- try(

.fitLinearModel(single_protein, is_single_feature,

is_labeled = label, equal_variances),

silent = TRUE

)

if (inherits(fit, "try-error")) {

msg = paste("*** error : can't fit the model for ",

unique(single_protein$PROTEIN))

...

🤖 Prompt for AI Agents

In R/dataProcess.R around lines 410-416, the call to .fitLinearModel is not wrapped in try/tryCatch yet the code tests inherits(fit, "try-error"), so model failures will raise and skip the handler; wrap the .fitLinearModel call in try(..., silent = TRUE) or use tryCatch to catch errors and return an object that can be detected (e.g., assign the error to fit with class "try-error" or a sentinel), then keep the existing inherits(fit, "try-error") branch to handle failures and log the message as before.

coderabbitai · 2025-09-09T17:10:47Z

R/utils_groupcomparison.R

+.fitModelForGroupComparison = function(input, repeated, 
+                                       is_single_subject, has_tech_replicates) {
+
+    if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){
+        weight_input = 1/input$Variance
+    } else {
+        weight_input = NULL
+    }
+


💡 Verification agent

🧩 Analysis chain

Sanity-check: tests assume full_fit$weights exists

For lm this holds; for lmer use weights(full_fit). Ensure tests only assert on lm path or adapt accordingly.

Run:

🏁 Script executed:

#!/bin/bash # Locate tests that access full_fit$weights and confirm they don't assume lmer objects rg -nP -C2 'full_fit\\$weights|weights\$full_fit\$' inst tests

Length of output: 0

🏁 Script executed:

#!/bin/bash # List directory structure to locate test folders fd -t d . # Search for direct references to full_fit$weights in test scripts rg -n 'full_fit\$weights' -C2 --glob '*test*.R' # Search for calls to weights(full_fit) in test scripts rg -n 'weights\s*$\s*full_fit\s*$' -C2 --glob '*test*.R'

Length of output: 871

Adapt tests to handle weights in lmer models
Inst/tinytest/test_utils_groupComparison_model.R (lines 14, 17) reference result$full_fit$weights, which exists for lm fits but not for lmer. Either assign full_fit$weights <- weights(full_fit) in the mixed-effects branch or update the tests to call weights(full_fit) only on the lmer path.

🤖 Prompt for AI Agents

In R/utils_groupcomparison.R around lines 139-147, the code sets weight_input but downstream tests expect result$full_fit$weights to exist for lm fits while lmer fits don't expose that field; in the mixed-effects branch assign the model weights into the fitted object (e.g. set full_fit$weights <- weights(full_fit) after fitting the lmer model) so result$full_fit$weights is present, or alternatively update the tests to call weights(full_fit) only for lmer objects—implement the former by adding that assignment immediately after the lmer fit.

coderabbitai · 2025-09-09T17:10:47Z

R/utils_groupcomparison.R

            full_fit = suppressMessages(try(
-                lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), data = input),
+                lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), 
+                           data = input, weights=weight_input),
                TRUE
            ))
            df_full = suppressMessages(try(
-                lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual,
+                lm(ABUNDANCE ~ GROUP + SUBJECT, 
+                   data = input, weights=weight_input)$df.residual,
                TRUE
            ))
        }


🛠️ Refactor suggestion

Harden df_full computation when lmer path is used

If the lm fallback fails, df_full becomes a try-error and later code treats it as numeric. Provide a safe fallback.

- df_full = suppressMessages(try( - lm(ABUNDANCE ~ GROUP + SUBJECT, - data = input, weights=weight_input)$df.residual, - TRUE - )) + df_full = suppressMessages(try( + lm(ABUNDANCE ~ GROUP + SUBJECT, + data = input, weights=weight_input)$df.residual, + TRUE + )) + if (inherits(df_full, "try-error")) { + df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)), + error = function(e) NA_real_) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

full_fit = suppressMessages(try(

lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), data = input),

lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT),

data = input, weights=weight_input),

TRUE

))

df_full = suppressMessages(try(

lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual,

lm(ABUNDANCE ~ GROUP + SUBJECT,

data = input, weights=weight_input)$df.residual,

TRUE

))

}

full_fit = suppressMessages(try(

lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT),

data = input, weights=weight_input),

TRUE

))

df_full = suppressMessages(try(

lm(ABUNDANCE ~ GROUP + SUBJECT,

data = input, weights=weight_input)$df.residual,

TRUE

))

if (inherits(df_full, "try-error")) {

df_full <- tryCatch(

stats::nobs(full_fit) - length(lme4::fixef(full_fit)),

error = function(e) NA_real_

)

}

🤖 Prompt for AI Agents

In R/utils_groupcomparison.R around lines 153 to 163, the df_full assignment from the lm fallback can produce a "try-error" when the lm call fails but later code assumes df_full is numeric; update the logic to detect a failed try (e.g. inherits(..., "try-error") or is.null) and provide a safe numeric fallback (such as NA_real_ or a computed residual df from model dimensions) before it is used downstream; ensure the check covers both full_fit and df_full failure cases and that df_full is explicitly coerced to numeric when successful.

coderabbitai · 2025-09-09T17:10:47Z

R/utils_groupcomparison.R

+                    lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), 
+                               data = input, weights=weight_input),
                    TRUE
                ))
                df_full = suppressMessages(try(
-                    lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual,
+                    lm(ABUNDANCE ~ GROUP + SUBJECT, 
+                       data = input, weights=weight_input)$df.residual,
                    TRUE))


🛠️ Refactor suggestion

Repeat df_full guard in this branch

Mirror the safe fallback after the lm() try, as above.

- df_full = suppressMessages(try( - lm(ABUNDANCE ~ GROUP + SUBJECT, - data = input, weights=weight_input)$df.residual, - TRUE)) + df_full = suppressMessages(try( + lm(ABUNDANCE ~ GROUP + SUBJECT, + data = input, weights=weight_input)$df.residual, + TRUE)) + if (inherits(df_full, "try-error")) { + df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)), + error = function(e) NA_real_) + }

🤖 Prompt for AI Agents

In R/utils_groupcomparison.R around lines 174 to 181, the df_full assignment uses try(... )$df.residual directly which will fail when try returns an error; mirror the safe fallback used earlier by capturing the try result, check for a "try-error" (or use inherits(..., "try-error")) and if an error set df_full to the same safe fallback (e.g., NA or 0) otherwise extract $df.residual; update this branch to follow the same guard pattern as the lm() try above so df_full is always defined safely.

coderabbitai · 2025-09-09T17:10:47Z

R/utils_groupcomparison.R

+                    lme4::lmer(
+                        ABUNDANCE ~ GROUP + (1|SUBJECT) + (1|GROUP:SUBJECT),
+                        data = input, weights=weight_input),
                    TRUE
                ))
                df_full = suppressMessages(try(
                    lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
-                       data = input)$df.residual,
+                       data = input, weights=weight_input)$df.residual,
                    TRUE


🛠️ Refactor suggestion

And here as well: protect df_full and keep behavior consistent

Add the same fallback computation for df_full in the tech-replicate branch.

- df_full = suppressMessages(try( - lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, - data = input, weights=weight_input)$df.residual, - TRUE - )) + df_full = suppressMessages(try( + lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, + data = input, weights=weight_input)$df.residual, + TRUE + )) + if (inherits(df_full, "try-error")) { + df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)), + error = function(e) NA_real_) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

lme4::lmer(

ABUNDANCE ~ GROUP + (1|SUBJECT) + (1|GROUP:SUBJECT),

data = input, weights=weight_input),

TRUE

))

df_full = suppressMessages(try(

lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,

data = input)$df.residual,

data = input, weights=weight_input)$df.residual,

TRUE

# existing try to compute residual df from lm

df_full = suppressMessages(try(

lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,

data = input, weights=weight_input)$df.residual,

TRUE

))

# fallback if lm() fails

if (inherits(df_full, "try-error")) {

df_full <- tryCatch(

stats::nobs(full_fit) - length(lme4::fixef(full_fit)),

error = function(e) NA_real_

)

}

🤖 Prompt for AI Agents

In R/utils_groupcomparison.R around lines 184 to 192, df_full in the tech-replicate branch is not protected and lacks the same fallback used elsewhere; wrap the lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, data = input, weights=weight_input)$df.residual call in suppressMessages(try(..., TRUE)) and if that returns an error, compute df_full using the identical fallback logic used in the other branch (e.g., the same nrow/input-based fallback calculation) so behavior is consistent and failures are handled gracefully.

tonywu1999 · 2025-09-10T18:31:18Z

R/dataProcess.R

                }
                single_protein = input[protein_indices[[i]],]
-                MSstatsSummarizeSingleLinear(single_protein, equal_variance)
+                MSstatsSummarizeSingleLinear(


equal_variance is not passed into MSstatsSummarizeSingleLinear

tonywu1999 · 2025-09-10T18:33:09Z

R/dataProcess.R

            single_protein = input[protein_indices[[protein_id]],]
-            summarized_result = MSstatsSummarizeSingleLinear(single_protein,
-                                                             equal_variance)
+            summarized_result = MSstatsSummarizeSingleLinear(


equal_variance is not passed into MSstatsSummarizeSingleLinear

devonjkohler and others added 21 commits April 4, 2025 08:53

added weights to groupComparison

35747e9

adding anomaly scores to dataProcess

c0a8ac6

anomaly weighted summarization

89c031e

anomaly scores into summaraization and model

1574fd9

fixed anomaly bug

bb7ca01

imputation convergence bug

56dd620

anomaly score to column validator

4655736

Merge branch 'feature-model_weights' of https://github.com/Vitek-Lab/…

1b3c60f

…MSstats into feature-model_weights

bug fix

e14d871

testing remove pca

8ee6fd5

minor bug fixes

2978d18

lm model

323a9bc

bug fix

5aff109

MSstats+ vignette

aa3f5b0

added manual converter example

be2a8d6

documentation bugs

f7a957b

MSstatsBig example

e019a9a

test(anomaly_scores): Add unit tests for dataProcess and groupCompari…

3a184ca

…son for anomaly score feature

add defaults for aft_iterations parameter to prevent Shiny from breaking

c71c16c

Add more granular unit tests for data summarization subfunctions

c140362

Merge pull request #173 from Vitek-Lab/tony-feature-model_weights

dcdfd78

test(anomaly_scores): Add unit tests for dataProcess and groupComparison for anomaly score feature

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

merge with changes in devel

beecaae

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

devonjkohler requested a review from tonywu1999 September 10, 2025 15:09

tonywu1999 reviewed Sep 10, 2025

View reviewed changes

Vitek-Lab deleted a comment from coderabbitai bot Sep 10, 2025

tonywu1999 reviewed Sep 10, 2025

View reviewed changes

minor text edits to msstatsplus vignette

5b972e7

devonjkohler merged commit fea8488 into devel Sep 10, 2025
0 of 2 checks passed

		output = output = MSstatsSummarizationOutput(input, summarized, processed,
		method, impute, cens)

-  aft_iterations,
+\usage{
+ MSstatsSummarize(
+   proteins_list,
+   method,
+   impute,
+   censored_symbol,
+   remove50missing,
+   aft_iterations,
+   equal_variance
+ )
+}
+\arguments{
+ \item{proteins_list}{list of processed feature-level data}
+ \item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'.
+ TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelerated failure time (AFT) model.
+ FALSE uses the values assigned by cutoffCensored}
+ \item{aft_iterations}{integer; maximum iterations for AFT-based imputation (only for summaryMethod = "TMP"). Typical default is 90.}
+}
+\examples{
+ summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, 90, TRUE)
+}

Feature model weights #174

Feature model weights #174

Uh oh!

Conversation

devonjkohler commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Changes

Testing

Checklist Before Requesting a Review

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks (2 warnings, 1 inconclusive)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

tonywu1999 Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

devonjkohler commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

tonywu1999 Sep 10, 2025 •

edited

Loading