-
Notifications
You must be signed in to change notification settings - Fork 50
Feature model weights #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…MSstats into feature-model_weights
…son for anomaly score feature
test(anomaly_scores): Add unit tests for dataProcess and groupComparison for anomaly score feature
|
Caution Review failedThe pull request is closed. WalkthroughAdds variance-aware linear summarization and anomaly-derived weighting across the pipeline. Threads a new aft_iterations param and covariance matrices through R and C++ layers, updates get_linear_summary to return LogIntensities+Variance, relaxes censored handling, adds anomaly-weight utilities, updates docs/tests, and adds a vignette. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant R as dataProcess / MSstatsSummarize*
participant L as MSstatsSummarizeSingleLinear
participant F as lm/lmer + vcov
participant C as .Call(_MSstats_get_linear_summary)
participant CPP as get_linear_summary (C++)
R->>L: single_protein, impute, censored_symbol, remove50missing, aft_iterations, equal_variances
L->>F: fit model (use newABUNDANCE / weights)
F-->>L: coefficients + vcov (cov_mat)
L->>C: input, coefs, counts, is_labeled, cov_mat
C->>CPP: pass args
CPP-->>C: DataFrame{LogIntensities, Variance}
C-->>L: DataFrame
L-->>R: run-level data (+ Variance) and survival-related data
sequenceDiagram
autonumber
participant GC as groupComparison
participant FM as .fitModelForGroupComparison
participant M as lm/lmer
GC->>FM: summarized input (may include Variance)
FM->>FM: if Variance present -> weight_input = 1/Variance else NULL
FM->>M: fit model with weights=weight_input
M-->>FM: weighted fit (coeffs, SE, p-values)
FM-->>GC: results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks (2 warnings, 1 inconclusive)❌ Failed checks (2 warnings, 1 inconclusive)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 24
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
R/utils_censored.R (4)
36-44: Guard against empty quantiles / zero IQR producing NaN cutoff and NA flags.When no data or IQR == 0, cutoff_lower becomes NaN; downstream comparisons yield NA and can cascade into errors. Compute vals first, handle IQR <= eps, and allow a NA cutoff to skip quantile-based censoring.
- quantiles = input[!is.na(INTENSITY) & INTENSITY > 1 & LABEL == "L", - quantile(ABUNDANCE, - prob = c(0.01, 0.25, 0.5, 0.75, - censored_cutoff), - na.rm = TRUE)] - iqr = quantiles[4] - quantiles[2] - multiplier = (quantiles[5] - quantiles[4]) / iqr - cutoff_lower = (quantiles[2] - multiplier * iqr) + vals = input[!is.na(INTENSITY) & INTENSITY > 1 & LABEL == "L", ABUNDANCE] + if (length(vals)) { + quantiles = quantile(vals, + prob = c(0.01, 0.25, 0.5, 0.75, censored_cutoff), + na.rm = TRUE, names = FALSE) + iqr = quantiles[4] - quantiles[2] + if (!is.finite(iqr) || iqr <= .Machine$double.eps) { + cutoff_lower = quantiles[2] + } else { + multiplier = (quantiles[5] - quantiles[4]) / iqr + cutoff_lower = quantiles[2] - multiplier * iqr + } + } else { + cutoff_lower = NA_real_ + }
44-46: Prevent NA-propagation in the censoring predicate.Ensure cutoff_lower is finite before vectorized comparison to avoid NA censor flags.
- input$censored = !is.na(input$INTENSITY) & - input$LABEL == "L" & - input$ABUNDANCE < cutoff_lower + input$censored = is.finite(cutoff_lower) & + !is.na(input$INTENSITY) & + input$LABEL == "L" & + input$ABUNDANCE < cutoff_lower
47-54: Fix if(NA) runtime risk when cutoff_lower is NA.if (cutoff_lower <= 0 & ...) can become if (NA) and error. Also prefer && for scalars; outer guard already ensures missing_symbol is non-null.
- if (cutoff_lower <= 0 & !is.null(missing_symbol) & missing_symbol == "0") { + if (is.finite(cutoff_lower) && missing_symbol == "0" && cutoff_lower <= 0) { zero_one_filter = !is.na(input$ABUNDANCE) & input$ABUNDANCE <= 0 input$censored = ifelse(zero_one_filter, TRUE, input$censored) } - if (!is.null(missing_symbol) & missing_symbol == "NA") { + if (missing_symbol == "NA") { input$censored = ifelse(is.na(input$INTENSITY), TRUE, input$censored) }
32-79: Restrict censored‐flagging to TMP or properly support linear summarization.
MSstatsHandleMissing()ignoressummary_method, so withimpute=TRUEnon-TMP paths now getinput$censoredmarks.- Both
.prepareSummary()(always using.prepareTMP()) andMSstatsSummarizeSingleLinear()apply AFT imputation based on those flags.- Either guard
MSstatsHandleMissing()withif (summary_method=="TMP")(or forceimpute=FALSEfor non-TMP), or re-enable and complete.prepareLinear()/.finalizeLinear()branches and update docs to reflect linear AFT support.R/utils_summarization.R (1)
210-216: WLS not applied: lm() re-fit uses a NULL symbol instead of the computed column.
wls.fit = lm(..., weights = weight)references a local NULL variable, notinput$weight. This disables the intended heteroscedasticity correction.Apply:
- wls.fit = lm(formula(fit), data = input, weights = weight) + wls.fit = lm(formula(fit), data = input, weights = input$weight)R/utils_imputation.R (2)
15-15: Avoid set.seed() in package code.Setting the global RNG seed is a side effect that disrupts users.
survreg()is deterministic; remove the seed.Apply:
- set.seed(100) + # no RNG needed here
60-65: .addSurvivalPredictions fails to pass aft_iterations.This will error after the signature change or silently use the default if added. Thread the parameter through.
Apply:
-.addSurvivalPredictions = function(input) { +.addSurvivalPredictions = function(input, aft_iterations = 90) { LABEL = NULL - survival_fit = .fitSurvival(input[LABEL == "L", ]) + survival_fit = .fitSurvival(input[LABEL == "L", ], aft_iterations) predict(survival_fit, newdata = input) }man/MSstatsSummarizeSingleTMP.Rd (1)
18-27: Fix user-facing typos and terminology (AFT)
- “Accelated failure model.” → “Accelerated Failure Time (AFT) model.”
- “intensites” → “intensities.”
- “Null assumes …” → “NULL assumes …” (R constant).
Apply:
-\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. -TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model. +\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. +TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) using an Accelerated Failure Time (AFT) model. FALSE uses the values assigned by cutoffCensored} @@ -Null assumes that all NA intensites are randomly missing.} +NULL assumes that all NA intensities are randomly missing.}R/groupComparison.R (1)
91-96: Robust labeled check and remove dead code
data$FeatureLevelData$Labelis inconsistent with later use ofLABELand may be NULL for minimal inputs. Either fix case or guard; alsolabeledisn’t used later here.- labeled = data.table::uniqueN(data$FeatureLevelData$Label) > 1 + labeled = FALSE + if (!is.null(data$FeatureLevelData) && + "LABEL" %in% colnames(data$FeatureLevelData)) { + labeled = data.table::uniqueN(data$FeatureLevelData$LABEL) > 1 + }Optionally drop the variable entirely if unused.
src/linear_summary.cpp (1)
184-186: Avoid NA propagation in contrast constructionNumericVector default-initializes to NA; leaving other entries as NA risks NaN results in get_quant/get_quant_var. Initialize to zeros.
- NumericVector contrast_matrix(num_runs); + NumericVector contrast_matrix(num_runs, 0.0); contrast_matrix[run_id] = 1;- NumericVector contrast_matrix(num_runs); + NumericVector contrast_matrix(num_runs, 0.0); contrast_matrix[num_runs - 1] = 1;Also applies to: 197-199
🧹 Nitpick comments (27)
R/utils_censored.R (2)
61-66: Avoid logging NaN/Inf cutoff values.Only log cutoff when finite; still log the generic symbol message.
- getOption("MSstatsMsg")("INFO", msg) - getOption("MSstatsMsg")("INFO", msg_2) - getOption("MSstatsLog")("INFO", msg) - getOption("MSstatsLog")("INFO", msg_2) + if (is.finite(cutoff_lower)) { + getOption("MSstatsMsg")("INFO", msg) + getOption("MSstatsLog")("INFO", msg) + } + getOption("MSstatsMsg")("INFO", msg_2) + getOption("MSstatsLog")("INFO", msg_2)
68-75: Minor: prefer && for scalar checks.Use short-circuit operators in scalar conditionals for clarity and to skip unnecessary RHS work.
- if (missing_symbol == '0') { + if (missing_symbol == '0') { input$censored = input$LABEL == "L" & !is.na(input$INTENSITY) & (input$INTENSITY == 1 | input$ABUNDANCE <= 0) - } else if (missing_symbol == 'NA') { + } else if (missing_symbol == 'NA') { input$censored = input$LABEL == "L" & is.na(input$ABUNDANCE) }R/utils_summarization.R (2)
136-143: Redundant branches in non-missing filter.Both branches under
censored_symbol == "NA"andelseare identical. Simplify.Apply:
- if (!is.null(censored_symbol)) { - if (censored_symbol == "NA") { - nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored - } else { - nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored - } - } else { - nonmissing_filter = input$LABEL == "L" & !is.na(input$INTENSITY) - } + if (!is.null(censored_symbol)) { + nonmissing_filter = input$LABEL == "L" & !is.na(input$newABUNDANCE) & !input$censored + } else { + nonmissing_filter = input$LABEL == "L" & !is.na(input$INTENSITY) + }
211-213: Typo in comment.“valuaes” → “values”.
Apply:
- ## loess fitted valuaes are predicted sd + ## loess fitted values are predicted sdR/utils_anomaly_weights.R (3)
13-16: Optional: validate required columns early.Fail fast if
ANOMALYSCORES,PEPTIDE,censored, orimputation_varare missing.Proposed prelude:
required_cols <- c("ANOMALYSCORES","PEPTIDE","censored","imputation_var") missing <- setdiff(required_cols, names(data)) if (length(missing)) stop("Missing columns: ", paste(missing, collapse=", "))
1-11: Consider simpler normalization.Subtract per-PEPTIDE min without adding
global_min; this guarantees non-negativity and preserves within-peptide contrasts.Apply:
- global_min = min(data$ANOMALYSCORES, na.rm = TRUE) - data[, local_min := min(ANOMALYSCORES, na.rm = TRUE), by = PEPTIDE] - data[, normalized_score := ANOMALYSCORES - local_min + global_min] + data[, local_min := min(ANOMALYSCORES, na.rm = TRUE), by = PEPTIDE] + data[, normalized_score := ANOMALYSCORES - local_min]
1-1: Add data.table import for merge method (if not already).If not present elsewhere, add roxygen import to ensure data.table methods are available.
Add at file top:
#' @import data.tableR/utils_checks.R (2)
154-157: Use numeric NA for AnomalyScores.Set a numeric type to avoid downstream type coercion surprises and to align with later NA_real_ usage.
Apply:
- input$AnomalyScores = NA + input$AnomalyScores = NA_real_
198-202: Minor: improve user-facing error message grammar.Polish for clarity.
- "So far, only label-free or reference-labeled experiment are supported. - stop")) + "So far, only label-free or reference-labeled experiments are supported. - stop"))Also applies to: 204-207
R/utils_summarization_prepare.R (1)
30-33: Ensure ANOMALYSCORES is numeric.Match utils_checks.R (NA_real_) to keep a consistent numeric type.
- if (!"ANOMALYSCORES" %in% colnames(input)) { - input[, ANOMALYSCORES := NA] - } + if (!"ANOMALYSCORES" %in% colnames(input)) { + input[, ANOMALYSCORES := NA_real_] + }R/RcppExports.R (1)
12-14: Add cov_mat validation in get_linear_summary
In R/RcppExports.R, before the .Call, assert that cov_mat is a square matrix with ncol(cov_mat) == length(coefs) and is symmetric, otherwise stop with a clear error.inst/tinytest/test_utils_groupComparison_model.R (1)
16-18: Harden the test against edge cases (zero/NA variances, numeric fuzz).Add a case with 0/NA Variance to assert the model either guards or errors clearly; also consider a tolerance in expect_equal to avoid float issues.
man/MSstatsSummarizeSingleTMP.Rd (1)
53-55: Prefer named argument in example to guard against signature driftUse a named parameter for
aft_iterationsin the example.-single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]], - impute, cens, FALSE, 100) +single_protein_summary = MSstatsSummarizeSingleTMP(input_split[[1]], + impute, cens, FALSE, + aft_iterations = 100)inst/tinytest/test_utils_summarization.R (3)
1-29: Synthetic fixture looks good; minor cleanup optional
ANOMALYSCORESisn’t used in this unit and could be dropped to keep the fixture minimal. No action required if you prefer the explicitness.
31-35: Equality check is fine; consider a tolerance for portabilityDirect equality on numeric vectors can be brittle across platforms. Add a tiny tolerance.
-expect_equal(summarized$weights, single_protein$weights, - info = "Weights from model should match input weights") +expect_equal(summarized$weights, single_protein$weights, tolerance = 1e-12, + info = "Weights from model should match input weights")
41-47: Keying into coefficients by name is brittleIf internal names change, this will break. Optionally guard with
expect_true("FEATUREASGLLLER_2_y3_1" %in% names(summarized$coefficients))before comparing, or retrieve by matching the feature column.+expect_true("FEATUREASGLLLER_2_y3_1" %in% names(summarized$coefficients))R/groupComparison.R (1)
97-101: Tighten log/message phrasing“Start to test and get inference in whole plot” is awkward. Shorten and unify LOG/MSG.
- getOption("MSstatsLog")( - "INFO", "== Start to test and get inference in whole plot") - getOption("MSstatsMsg")( - "INFO", " == Start to test and get inference in whole plot ...") + getOption("MSstatsLog")("INFO", "== Starting group comparison on all proteins") + getOption("MSstatsMsg")("INFO", " == Starting group comparison on all proteins ...")inst/tinytest/test_groupComparison.R (2)
49-54: Use a small tolerance for numeric equalityFloating-point quirks can cause spurious failures across platforms.
-expect_equal(no_anomaly_Q9UFW8$SE, with_anomaly_Q9UFW8$SE, +expect_equal(no_anomaly_Q9UFW8$SE, with_anomaly_Q9UFW8$SE, tolerance = 1e-12, info = "SE values for Q9UFW8 should be the same") -expect_equal(no_anomaly_Q9UFW8$pvalue, with_anomaly_Q9UFW8$pvalue, +expect_equal(no_anomaly_Q9UFW8$pvalue, with_anomaly_Q9UFW8$pvalue, tolerance = 1e-12, info = "pvalue values for Q9UFW8 should be the same")
59-62: Inequality assertions are good; optionally add tolerance bandsIf borderline, consider asserting a minimum relative/absolute difference to avoid flakes.
inst/tinytest/test_dataProcess.R (1)
29-48: Consider setting a seed before imputationIf AFT or related steps introduce any stochasticity, set a seed for stability.
+# Ensure deterministic behavior if any stochastic steps are used +set.seed(42) msstats_input = data.table::data.table(src/linear_summary.cpp (1)
193-194: Numerical safety: clamp tiny negative variances to 0Floating-point noise can yield slightly negative values. Clamp to non-negative.
- variances[run_id] = get_quant_var(contrast, cov_mat); + variances[run_id] = std::max(0.0, get_quant_var(contrast, cov_mat));- variances.push_back(get_quant_var(contrast, cov_mat)); + variances.push_back(std::max(0.0, get_quant_var(contrast, cov_mat)));Also applies to: 203-204
vignettes/MSstatsPlus.Rmd (4)
116-123: Standardize column names or use named args for clarityThe example uses sanitized column names (
FGShapeQualityScore(MS2)) while earlier text uses raw Spectronaut headers with spaces/dots. Clarify that the converter standardizes names, or standardize explicitly to avoid user confusion.msstats_input = MSstatsConvert::SpectronauttoMSstatsFormat( spectronaut_raw, annotation, intensity = 'PeakArea', excludedFromQuantificationFilter = TRUE, # Pre-filtering calculateAnomalyScores=TRUE, # Key parameter to use MSstats+ - anomalyModelFeatures=c("FGShapeQualityScore(MS2)", - "FGShapeQualityScore(MS1)", - "EGDeltaRT"), # Quality metrics + # Use standardized column names (no dots/spaces) or call name standardizer + anomalyModelFeatures=c("FGShapeQualityScore(MS2)", + "FGShapeQualityScore(MS1)", + "EGDeltaRT"), # Quality metrics (standardized)And add a short note right below explaining that MSstatsConvert standardizes feature names (dots/spaces removed), or show how to map raw headers to standardized ones.
347-362: Fix axis labels in the fold-change plotLabels don’t match aesthetics (x=Protein, y=abs(log2FC)).
ggplot(data=merged_results) + geom_col(aes(x=Protein, y=abs(log2FC), fill=Model), position="dodge") + geom_hline(yintercept=1, linetype="dashed", color="black", linewidth=1) + theme_bw(base_size =16) + labs(title="Log fold change comparison", - x="MSstats+", - y="Base MSstats") + x="Protein", + y="|log2FC|")
367-373: Fix axis labels in the SE plotSame issue as above.
ggplot(data=merged_results) + geom_col(aes(x=Protein, y=SE, fill=Model), position="dodge") + theme_bw(base_size =16) + labs(title="Standard error comparison", - x="MSstats+", - y="Base MSstats") + x="Protein", + y="SE")
260-265: Use named arguments for MSstatsAnomalyScores exampleUnlabeled positional args reduce readability and risk misordering.
input = MSstatsConvert::MSstatsAnomalyScores( - msstats_input, - c("FG.ShapeQualityScore (MS1)", "FG.ShapeQualityScore (MS2)", "EG.DeltaRT"), - c("mean_decrease", "mean_decrease", "dispersion_increase"), - .5, 100, run_order, 100, "auto", 1) + data = msstats_input, + features = c("FG.ShapeQualityScore (MS1)", "FG.ShapeQualityScore (MS2)", "EG.DeltaRT"), + featureTemporal = c("mean_decrease", "mean_decrease", "dispersion_increase"), + trainQuantile = 0.5, sampleSize = 100, runOrder = run_order, + n_trees = 100, max_depth = "auto", numberOfCores = 1)R/utils_groupcomparison.R (1)
466-478: Validatehas_imputedattribute presence and type
After retrievinghas_imputed <- attr(summarized_list, "has_imputed"), add a check to ensure it exists and is a length-one logical, for example:if (!is.logical(has_imputed) || length(has_imputed) != 1) { stop("`summarized_list` must have a single logical `has_imputed` attribute") }This prevents unexpected NULL or non-logical values from causing runtime errors.
R/dataProcess.R (1)
128-129: Document the default value choice for aft_iterationsThe default value of 90 iterations for AFT model fitting seems arbitrary. Consider documenting why this specific value was chosen or make it configurable based on convergence criteria.
Consider adding a comment explaining the rationale:
- numberOfCores = 1, aft_iterations=90 + numberOfCores = 1, aft_iterations=90 # Default based on empirical testing for typical proteomics datasets
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
R/RcppExports.R(1 hunks)R/dataProcess.R(15 hunks)R/groupComparison.R(3 hunks)R/utils_anomaly_weights.R(1 hunks)R/utils_censored.R(1 hunks)R/utils_checks.R(5 hunks)R/utils_groupcomparison.R(4 hunks)R/utils_imputation.R(2 hunks)R/utils_output.R(2 hunks)R/utils_summarization.R(1 hunks)R/utils_summarization_prepare.R(2 hunks)inst/tinytest/test_dataProcess.R(1 hunks)inst/tinytest/test_groupComparison.R(1 hunks)inst/tinytest/test_utils_groupComparison_model.R(1 hunks)inst/tinytest/test_utils_summarization.R(1 hunks)man/MSstatsSummarizationOutput.Rd(1 hunks)man/MSstatsSummarize.Rd(1 hunks)man/MSstatsSummarizeSingleLinear.Rd(2 hunks)man/MSstatsSummarizeSingleTMP.Rd(2 hunks)man/MSstatsSummarizeWithMultipleCores.Rd(1 hunks)man/MSstatsSummarizeWithSingleCore.Rd(2 hunks)man/dataProcess.Rd(3 hunks)man/dot-checkDataProcessParams.Rd(2 hunks)src/RcppExports.cpp(2 hunks)src/linear_summary.cpp(4 hunks)vignettes/MSstatsPlus.Rmd(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
R/RcppExports.R (1)
src/comparison_contrasts.cpp (1)
get_estimable_fixed_random(6-27)
inst/tinytest/test_groupComparison.R (1)
src/comparison_contrasts.cpp (1)
get_estimable_fixed_random(6-27)
src/RcppExports.cpp (1)
src/linear_summary.cpp (2)
get_linear_summary(169-210)get_linear_summary(169-173)
src/linear_summary.cpp (1)
src/RcppExports.cpp (10)
contrast(21-21)cov_mat(52-52)input(32-32)input(48-48)coefs(33-33)coefs(49-49)counts(35-35)counts(50-50)is_labeled(36-36)is_labeled(51-51)
🔇 Additional comments (31)
R/utils_summarization.R (1)
167-172: LGTM: add weights to non-labeled linear models.Passing weights into the non-labeled single- and multi-feature lm() calls aligns with the PR’s goal.
R/utils_imputation.R (1)
20-49: LGTM: consistently using aft_iterations for survreg control.Replacing hard-coded maxiter with
aft_iterationsacross labeled/unlabeled paths is correct.man/dot-checkDataProcessParams.Rd (1)
7-15: Approve docs update: R/utils_checks.R includes@param input_columnsmatching the function’s signature; re-run roxygen to regenerate the updated manual page.R/utils_checks.R (2)
99-105: Good guard: disallow AnomalyScores with non-linear summary.This check is appropriate and prevents silent misuse.
91-94: All call sites include the newinput_columnsparameter
Verified thatR/dataProcess.Rpassescolnames(raw)as the seventh argument and the man page (man/dot-checkDataProcessParams.Rd) documents the updated signature.man/MSstatsSummarizationOutput.Rd (1)
56-56: Example updated to 7-arg MSstatsSummarizeWithSingleCore call—LGTM.inst/tinytest/test_utils_groupComparison_model.R (1)
1-19: Good: verifies inverse-variance weights are plumbed to full_fit.R/utils_output.R (1)
35-35: Example updated to pass aft_iterations—LGTM.man/MSstatsSummarizeWithSingleCore.Rd (1)
62-63: Example update to 7 args—LGTM.man/MSstatsSummarizeSingleLinear.Rd (1)
41-41: Example aligns with new signature—LGTM.R/groupComparison.R (3)
101-104: LGTM: purely formatting for readabilityArgument wrapping improves readability; no functional change.
175-186: LGTM: early return for parallel pathThe guard and formatting are fine.
275-287: LGTM: clearer multi-line callPure formatting. No functional concerns.
inst/tinytest/test_groupComparison.R (2)
19-21: Row-count parity assertion is goodValidates multi-core equivalence at the surface level.
22-36: Weighted vs unweighted fixtures look reasonableVariance pattern should exercise weighting differences for Q96S19 and sameness for Q9UFW8.
inst/tinytest/test_dataProcess.R (8)
27-28: LGTM: basic sanity on tech reps/fractionsGood minimal check that FeatureLevelData is populated.
51-55: Error pattern check is appropriatePattern is specific enough; no change needed.
64-74: Good validations on Variance columnCovers presence, positivity, finiteness, and per-protein constancy (for Q9UFW8).
96-104: Comparative variance test is meaningfulCaptures expected behavior shift with lower anomaly scores.
106-115: Absence-of-anomaly control is soundIdentity check guards against accidental reuse of the same variance vector.
149-156: Single-protein linear summarization: good coverageParameters (including
aft_iterationsandequal_variances) are exercised; shapes validated.
171-189: Variance stratification assertions are strongClear expectation that runs 4–5 have substantially higher variance.
190-210: Proximity check to feature-level abundances is a nice consistency testSolid behavioral validation of weighted aggregation.
src/linear_summary.cpp (2)
106-112: Variance via cᵀΣc looks correctComputation is standard and efficient with Armadillo. Nice addition.
206-209: Return schema LGTMTwo-column DataFrame with "LogIntensities" and "Variance" is clear and minimal.
src/RcppExports.cpp (1)
43-56: Wrapper update and registration look correctSignature, argument forwarding, and CallEntries (arity=5) are consistent.
Please confirm R/RcppExports.R was re-generated and committed so the .Call arity matches (5 args). If helpful, I can generate a quick check script.
Also applies to: 74-74
R/utils_groupcomparison.R (1)
150-155: LGTM! Consistent weight propagation across model fitting pathsThe implementation correctly applies variance-based weights to all linear and mixed-effects models across different experimental designs (not repeated, repeated, time-course). This ensures consistent weighting throughout the group comparison workflow.
R/dataProcess.R (4)
545-553: Good convergence handling implementationThe implementation properly handles AFT model convergence warnings by catching them and setting predicted values to NA when convergence fails. This prevents silent failures and ensures data integrity.
486-488: Resolved: get_linear_summary signature matches C++ implementation
Verified that the R callget_linear_summary(single_protein, cf, counts, label, cov_mat)aligns exactly with the C++ definition insrc/linear_summary.cppand its Rcpp export insrc/RcppExports.cpp. No further action needed.
427-429: Ignore n_obs column existence concernThe columns
n_obsandn_obs_runare generated inMSstatsPrepareForSummarization(via.prepareLinear/.prepareTMPinR/utils_summarization_prepare.R) before downstream filtering, so the runtime filter on those fields indataProcess.Ris valid.Likely an incorrect or invalid review comment.
555-559: Ignore converged flag scope concern
The local variableconvergedis defined inMSstatsSummarizeSingleTMPand<<-within the warning handler correctly updates that same variable. No scoping issue remains.Likely an incorrect or invalid review comment.
| protein_level_data_no_anomalies = protein_level_data_with_anomalies | ||
| protein_level_data_no_anomalies$Variance = NULL | ||
| QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies) | ||
| QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies) | ||
| model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly, | ||
| use_log_file = FALSE) | ||
| model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly, | ||
| use_log_file = FALSE) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Ensure minimal input structure matches groupComparison expectations
groupComparison() calls MSstatsPrepareForGroupComparison(), which references FeatureLevelData$LABEL. Provide a stub FeatureLevelData to avoid NULL access in minimal tests and use named arguments.
-protein_level_data_no_anomalies$Variance = NULL
-QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)
-QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)
-model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,
- use_log_file = FALSE)
-model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,
- use_log_file = FALSE)
+protein_level_data_no_anomalies$Variance = NULL
+stub_feature = data.table::data.table(LABEL = factor("L"))
+QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies,
+ FeatureLevelData = stub_feature)
+QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies,
+ FeatureLevelData = stub_feature)
+model_no_anomaly = groupComparison(contrast.matrix = "pairwise",
+ data = QuantDataNoAnomaly,
+ use_log_file = FALSE)
+model_with_anomaly = groupComparison(contrast.matrix = "pairwise",
+ data = QuantDataWithAnomaly,
+ use_log_file = FALSE)🏁 Script executed:
#!/bin/bash
# Locate the definition of groupComparison and its signature
rg -nP 'groupComparison\s*<-' -g '*.R'
# Locate MSstatsPrepareForGroupComparison and check for FeatureLevelData$LABEL usage
rg -n 'MSstatsPrepareForGroupComparison' -g '*.R' -C5Length of output: 6387
🏁 Script executed:
#!/bin/bash
# Show lines around groupComparison usage of FeatureLevelData
sed -n '80,110p' R/groupComparison.R
# Show lines around MSstatsPrepareForGroupComparison and LABEL usage
sed -n '120,150p' R/groupComparison.RLength of output: 3049
Ensure tests supply a minimal FeatureLevelData stub
Add a stub FeatureLevelData with a LABEL column and include it in your QuantData lists; switch to named args for groupComparison():
-protein_level_data_no_anomalies$Variance = NULL
-QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies)
-QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies)
-model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly,
- use_log_file = FALSE)
-model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly,
- use_log_file = FALSE)
+protein_level_data_no_anomalies$Variance = NULL
+stub_feature = data.table::data.table(LABEL = factor("L"))
+QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies,
+ FeatureLevelData = stub_feature)
+QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies,
+ FeatureLevelData = stub_feature)
+model_no_anomaly = groupComparison(contrast.matrix = "pairwise",
+ data = QuantDataNoAnomaly,
+ use_log_file = FALSE)
+model_with_anomaly = groupComparison(contrast.matrix = "pairwise",
+ data = QuantDataWithAnomaly,
+ use_log_file = FALSE)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| protein_level_data_no_anomalies = protein_level_data_with_anomalies | |
| protein_level_data_no_anomalies$Variance = NULL | |
| QuantDataNoAnomaly = list(ProteinLevelData = protein_level_data_no_anomalies) | |
| QuantDataWithAnomaly = list(ProteinLevelData = protein_level_data_with_anomalies) | |
| model_no_anomaly = groupComparison("pairwise", QuantDataNoAnomaly, | |
| use_log_file = FALSE) | |
| model_with_anomaly = groupComparison("pairwise", QuantDataWithAnomaly, | |
| use_log_file = FALSE) | |
| protein_level_data_no_anomalies = protein_level_data_with_anomalies | |
| protein_level_data_no_anomalies$Variance = NULL | |
| # Minimal FeatureLevelData stub required by groupComparison() | |
| stub_feature = data.table::data.table(LABEL = factor("L")) | |
| QuantDataNoAnomaly = list( | |
| ProteinLevelData = protein_level_data_no_anomalies, | |
| FeatureLevelData = stub_feature | |
| ) | |
| QuantDataWithAnomaly = list( | |
| ProteinLevelData = protein_level_data_with_anomalies, | |
| FeatureLevelData = stub_feature | |
| ) | |
| model_no_anomaly = groupComparison( | |
| contrast.matrix = "pairwise", | |
| data = QuantDataNoAnomaly, | |
| use_log_file = FALSE | |
| ) | |
| model_with_anomaly = groupComparison( | |
| contrast.matrix = "pairwise", | |
| data = QuantDataWithAnomaly, | |
| use_log_file = FALSE | |
| ) |
🤖 Prompt for AI Agents
In inst/tinytest/test_groupComparison.R around lines 37 to 45, the tests call
groupComparison with QuantData lists that lack a minimal FeatureLevelData stub
and use positional args; add a FeatureLevelData element to both
QuantDataNoAnomaly and QuantDataWithAnomaly containing at least a LABEL column
(matching the peptides/features in ProteinLevelData), include it in the lists,
and call groupComparison with named arguments (e.g., data = QuantDataNoAnomaly,
design = "pairwise", use_log_file = FALSE) so the function receives the required
FeatureLevelData and parameters explicitly.
| numberOfCores = 1, | ||
| aft_iterations = 90 | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
API/docs alignment: aft_iterations added; clarify scope and enforce anomaly→linear rule.
- Ensure all public wrappers default aft_iterations = 90 consistently.
- Explicitly state whether aft_iterations affects TMP only (AFT imputation) or also linear paths.
- The text “if anomaly detection … ‘linear’ must be used” should be enforced in code (e.g., parameter validation) if not already.
To confirm consistency of defaults across the repo:
Also applies to: 71-73, 118-119
🏁 Script executed:
#!/bin/bash
rg -nP '\baft_iterations\s*=\s*90\b' -C2Length of output: 3445
Clarify aft_iterations scope and enforce anomaly→linear rule
- Docs (e.g. man/dataProcess.Rd and related *.Rd) must state that
aft_iterationsonly affects AFT/TMP imputation and has no effect on linear summarization. - In each
MSstatsSummarize…function, add a parameter check to error ifimpute="anomaly"is used withoutmethod="linear".
| output = output = MSstatsSummarizationOutput(input, summarized, processed, | ||
| method, impute, cens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicated assignment in example ("output = output =").
Change to a single assignment to avoid confusing users.
🤖 Prompt for AI Agents
In man/MSstatsSummarizationOutput.Rd around lines 57 to 58, there is a
duplicated assignment in the example ("output = output = ..."); update this to a
single assignment by removing the extra "output =" so the example shows a
single, correct assignment of the function result to the variable.
man/MSstatsSummarize.Rd
Outdated
| impute, | ||
| censored_symbol, | ||
| remove50missing, | ||
| aft_iterations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs incomplete: aft_iterations lacks an argument entry and examples use the old signature.
Add an \item for aft_iterations and update examples to include it. Also fix typos in the imputation description.
\usage{
MSstatsSummarize(
proteins_list,
method,
impute,
censored_symbol,
remove50missing,
- aft_iterations,
+ aft_iterations,
equal_variance
)
}
\arguments{
\item{proteins_list}{list of processed feature-level data}
@@
-\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'.
-TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelated failure model.
+\item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'.
+TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelerated failure time (AFT) model.
FALSE uses the values assigned by cutoffCensored}
@@
-Null assumes that all NA intensites are randomly missing.}
+Null assumes that all NA intensities are randomly missing.}
+
+\item{aft_iterations}{integer; maximum iterations for AFT-based imputation (only for summaryMethod = "TMP"). Typical default is 90.}
@@
-summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, TRUE)
+summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, 90, TRUE)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| aft_iterations, | |
| \usage{ | |
| MSstatsSummarize( | |
| proteins_list, | |
| method, | |
| impute, | |
| censored_symbol, | |
| remove50missing, | |
| aft_iterations, | |
| equal_variance | |
| ) | |
| } | |
| \arguments{ | |
| \item{proteins_list}{list of processed feature-level data} | |
| \item{impute}{only for summaryMethod = "TMP" and censoredInt = 'NA' or '0'. | |
| TRUE (default) imputes 'NA' or '0' (depending on censoredInt option) by Accelerated failure time (AFT) model. | |
| FALSE uses the values assigned by cutoffCensored} | |
| \item{aft_iterations}{integer; maximum iterations for AFT-based imputation (only for summaryMethod = "TMP"). Typical default is 90.} | |
| } | |
| \examples{ | |
| summarized = MSstatsSummarize(input_split, method, impute, cens, FALSE, 90, TRUE) | |
| } |
🤖 Prompt for AI Agents
In man/MSstatsSummarize.Rd around line 13, the documentation is missing an \item
entry for the aft_iterations argument and the examples still use the old
function signature; update the Rd file by adding a properly formatted
\item{aft_iterations}{<brief description>} entry describing its purpose and
default behavior, revise the Examples section to call MSstatsSummarize with the
aft_iterations parameter included, and correct typographical errors in the
imputation description (fix spelling and grammar) so the description reads
clearly and accurately.
| MSstatsSummarizeSingleLinear( | ||
| single_protein, | ||
| impute, | ||
| censored_symbol, | ||
| remove50missing, | ||
| aft_iterations = 90, | ||
| equal_variances = TRUE | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments section incomplete for expanded signature.
Document impute, censored_symbol, remove50missing, aft_iterations, and equal_variances to match usage.
🤖 Prompt for AI Agents
In man/MSstatsSummarizeSingleLinear.Rd around lines 7–14, the Arguments section
doesn’t document the expanded signature parameters impute, censored_symbol,
remove50missing, aft_iterations, and equal_variances; add individual Rd entries
for each parameter including its name, expected type, default value (impute —
logical or method, censored_symbol — character scalar, remove50missing —
logical, aft_iterations — integer default 90, equal_variances — logical default
TRUE) and a one-sentence description of what it controls and its effect on the
function (e.g., impute: whether/how missing values are imputed; censored_symbol:
symbol representing censored observations; remove50missing: whether to drop
features with ≥50% missing; aft_iterations: number of AFT EM iterations;
equal_variances: whether to assume equal variances across groups), keeping
wording consistent with other argument entries and updating any \usage or
\examples sections if they reference the expanded signature.
| # if (method == "TMP") { | ||
| input = .prepareTMP(input, impute, censored_symbol) | ||
| # } else { | ||
| # input = .prepareLinear(input, FALSE, censored_symbol) | ||
| # } | ||
| input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: method is ignored; linear path never used.
.prepareSummary always routes to TMP, contradicting linear-weighting work and the AnomalyScores guard in utils_checks.R. This will alter behavior for linear summarization.
Restore dispatch:
-# if (method == "TMP") {
-input = .prepareTMP(input, impute, censored_symbol)
-# } else {
-# input = .prepareLinear(input, FALSE, censored_symbol)
-# }
+if (identical(method, "TMP")) {
+ input = .prepareTMP(input, impute, censored_symbol)
+} else {
+ input = .prepareLinear(input, FALSE, censored_symbol)
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # if (method == "TMP") { | |
| input = .prepareTMP(input, impute, censored_symbol) | |
| # } else { | |
| # input = .prepareLinear(input, FALSE, censored_symbol) | |
| # } | |
| input | |
| if (identical(method, "TMP")) { | |
| input = .prepareTMP(input, impute, censored_symbol) | |
| } else { | |
| input = .prepareLinear(input, FALSE, censored_symbol) | |
| } | |
| input |
🤖 Prompt for AI Agents
In R/utils_summarization_prepare.R around lines 110 to 115, the code
unconditionally calls .prepareTMP and ignores the method parameter so the linear
path is never used; restore proper dispatch by checking the method value (e.g.,
if (method == "TMP") { input = .prepareTMP(input, impute, censored_symbol) }
else { input = .prepareLinear(input, FALSE, censored_symbol) }) and ensure the
function returns input; also add a defensive branch or error for
unknown/unsupported method values to match the AnomalyScores guard in
utils_checks.R.
| if (all(!is.na(input$weights))){ | ||
| weight_input = input$weights | ||
| } else { | ||
| weight_input = NULL | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Guard weights: require non-missing, finite, and non-negative before using in lm().
Avoid silent acceptance of NULL/invalid weights (e.g., negative or Inf) which will error or skew fits.
Apply:
- if (all(!is.na(input$weights))){
- weight_input = input$weights
- } else {
- weight_input = NULL
- }
+ if (!is.null(input$weights) &&
+ all(is.finite(input$weights)) &&
+ all(input$weights >= 0) &&
+ all(!is.na(input$weights))) {
+ weight_input = input$weights
+ } else {
+ weight_input = NULL
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (all(!is.na(input$weights))){ | |
| weight_input = input$weights | |
| } else { | |
| weight_input = NULL | |
| } | |
| if (!is.null(input$weights) && | |
| all(is.finite(input$weights)) && | |
| all(input$weights >= 0) && | |
| all(!is.na(input$weights))) { | |
| weight_input = input$weights | |
| } else { | |
| weight_input = NULL | |
| } |
🤖 Prompt for AI Agents
In R/utils_summarization.R around lines 159 to 164, the current guard only
checks for non-NA weights and may pass invalid values (NA, NaN, Inf, negative)
to lm(); update the validation to ensure input$weights is numeric and contains
only finite, non-missing, and non-negative values before assigning weight_input,
otherwise set weight_input <- NULL; specifically reject or coerce non-numeric
inputs, check all(is.finite(weights)) and all(weights >= 0), and only then use
them in lm(), otherwise fall back to NULL to avoid errors or skewed fits.
| // [[Rcpp::export]] | ||
| NumericVector get_linear_summary(const DataFrame& input, | ||
| DataFrame get_linear_summary(const DataFrame& input, | ||
| const NumericVector& coefs, | ||
| const NumericMatrix& counts, | ||
| const bool is_labeled) { | ||
| const bool is_labeled, | ||
| const NumericMatrix& cov_mat) { | ||
| CharacterVector runs = input["RUN"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validate cov_mat shape and alignment with coefs
Add defensive checks to fail fast on mismatched dimensions and non-square inputs.
DataFrame get_linear_summary(const DataFrame& input,
const NumericVector& coefs,
const NumericMatrix& counts,
const bool is_labeled,
const NumericMatrix& cov_mat) {
+ // Basic validation
+ if (cov_mat.nrow() != cov_mat.ncol()) {
+ stop("cov_mat must be a square matrix (n x n).");
+ }
+ if (cov_mat.nrow() != coefs.size()) {
+ stop("Dimension mismatch: nrow(cov_mat) must equal length(coefs).");
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // [[Rcpp::export]] | |
| NumericVector get_linear_summary(const DataFrame& input, | |
| DataFrame get_linear_summary(const DataFrame& input, | |
| const NumericVector& coefs, | |
| const NumericMatrix& counts, | |
| const bool is_labeled) { | |
| const bool is_labeled, | |
| const NumericMatrix& cov_mat) { | |
| CharacterVector runs = input["RUN"]; | |
| // [[Rcpp::export]] | |
| DataFrame get_linear_summary(const DataFrame& input, | |
| const NumericVector& coefs, | |
| const NumericMatrix& counts, | |
| const bool is_labeled, | |
| const NumericMatrix& cov_mat) { | |
| // Basic validation | |
| if (cov_mat.nrow() != cov_mat.ncol()) { | |
| stop("cov_mat must be a square matrix (n x n)."); | |
| } | |
| if (cov_mat.nrow() != coefs.size()) { | |
| stop("Dimension mismatch: nrow(cov_mat) must equal length(coefs)."); | |
| } | |
| CharacterVector runs = input["RUN"]; | |
| // ... rest of function ... | |
| } |
🤖 Prompt for AI Agents
In src/linear_summary.cpp around lines 168 to 174, add defensive validation to
ensure cov_mat is square and its dimensions match the number of coefficients:
check cov_mat.nrow() == cov_mat.ncol() and cov_mat.nrow() == coefs.size(); if
either check fails call Rcpp::stop with a clear message (e.g. "cov_mat must be
square and size must equal length(coefs)"); perform these checks early
(immediately after obtaining inputs) so the function fails fast on dimension
mismatches.
| metric.9 Possible values are `mean_decrease`, `mean_increase`, and | ||
| `dispersion_increase`. These indicate the direction a metric is changes which | ||
| indicates a drop in quality (i.e., decrease, increase, or can go in either | ||
| direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a | ||
| deviation in expected retention can be either higher or lower. | ||
| - `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo and grammar in parameter description
Stray “9” and wording nit in anomalyModelFeatureTemporal.
- metric.9 Possible values are `mean_decrease`, `mean_increase`, and
- `dispersion_increase`. These indicate the direction a metric is changes which
+ metric. Possible values are `mean_decrease`, `mean_increase`, and
+ `dispersion_increase`. These indicate the direction a metric changes which
indicates a drop in quality (i.e., decrease, increase, or can go in either📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| metric.9 Possible values are `mean_decrease`, `mean_increase`, and | |
| `dispersion_increase`. These indicate the direction a metric is changes which | |
| indicates a drop in quality (i.e., decrease, increase, or can go in either | |
| direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a | |
| deviation in expected retention can be either higher or lower. | |
| - `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The | |
| metric Possible values are `mean_decrease`, `mean_increase`, and | |
| `dispersion_increase`. These indicate the direction a metric changes which | |
| indicates a drop in quality (i.e., decrease, increase, or can go in either | |
| direction). For example, `EG.DeltaRT` uses `dispersion_increase` because a | |
| deviation in expected retention can be either higher or lower. | |
| - `runOrder`: A data.table with two columns: `R.FileName` and `Order`. The |
🤖 Prompt for AI Agents
In vignettes/MSstatsPlus.Rmd around lines 141 to 146, the parameter description
for anomalyModelFeatureTemporal contains a stray "9" after "metric." and awkward
wording; remove the stray "9" and rephrase the sentence for clarity (e.g.,
"Possible values are `mean_decrease`, `mean_increase`, and
`dispersion_increase`. These indicate the direction a metric changes that
corresponds to a drop in quality..."), and fix the surrounding grammar/wording
(e.g., "indicates the direction a metric changes" -> "indicate the direction a
metric changes") so the paragraph reads cleanly and flows correctly.
| "FG.ShapeQualityScore (MS1)", | ||
| "EG.ApexRT") | ||
| anomalyModelFeatures = MSstatsConvert:::.standardizeColnames(anomalyModelFeatures) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid triple-colon usage in vignette
MSstatsConvert:::.standardizeColnames is internal; vignettes should not rely on internals. Show a documented alternative or inline a simple sanitizer.
-anomalyModelFeatures = MSstatsConvert:::.standardizeColnames(anomalyModelFeatures)
+# Example: simple standardization without relying on internals
+anomalyModelFeatures = gsub("\\s+", "", anomalyModelFeatures) # drop spaces
+anomalyModelFeatures = gsub("\\.", "", anomalyModelFeatures) # drop dots
+anomalyModelFeatures = gsub("[^A-Za-z0-9()]", "", anomalyModelFeatures) # keep alnum/()
+# Or, if MSstatsConvert exposes a public helper, prefer that here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
R/dataProcess.R (1)
220-225: Export aft_iterations to workers in PSOCK clustersaft_iterations is used inside parLapply closures but not exported to the cluster; this may cause “object 'aft_iterations' not found” on some setups.
Apply:
parallel::clusterExport(cl, c("MSstatsSummarizeSingleTMP", "MSstatsSummarizeSingleLinear", "input", "impute", "censored_symbol", - "remove50missing", "protein_indices", - "equal_variance"), + "remove50missing", "protein_indices", + "equal_variance", "aft_iterations"), envir = function_environment)
♻️ Duplicate comments (4)
R/utils_output.R (1)
111-115: Bug: .finalizeInput always routes through TMP; linear summarization path is dead.
Restoring method-based branching is necessary; otherwise linear runs get TMP-specific fields and imputation logic.Apply:
- # if (method == "TMP") { - input = .finalizeTMP(input, censored_symbol, impute, summarized) - # } else { - # input = .finalizeLinear(input, censored_symbol) - # } + if (identical(method, "TMP")) { + input = .finalizeTMP(input, censored_symbol, impute, summarized) + } else { + input = .finalizeLinear(input, censored_symbol) + }R/dataProcess.R (2)
375-393: Add robust error handling for survival fit and predict in linear pathSame concern as previously raised: failures in .fitSurvival/predict can silently produce NA and derail downstream steps. Add tryCatch and logging; also compute imputation_var only on success.
Apply:
- if (impute & any(single_protein[["censored"]])) { - survival_fit = .fitSurvival(single_protein[LABEL == "L", cols, - with = FALSE], - aft_iterations) - sigma2 = survival_fit$scale^2 - single_protein[, c("predicted", "imputation_var") := { - pred = predict(survival_fit, newdata = .SD, se.fit = TRUE) - list(pred$fit, pred$se.fit^2 + sigma2) - }] + if (impute && any(single_protein[["censored"]])) { + survival_fit <- tryCatch( + .fitSurvival(single_protein[LABEL == "L", cols, with = FALSE], + aft_iterations), + error = function(e) { + msg <- paste("*** warning: Survival model fitting failed for protein", + unique(single_protein$PROTEIN), ":", conditionMessage(e)) + getOption("MSstatsLog")("WARN", msg) + NULL + } + ) + if (!is.null(survival_fit)) { + sigma2 <- survival_fit$scale^2 + single_protein[, c("predicted", "imputation_var") := { + pr <- tryCatch( + predict(survival_fit, newdata = .SD, se.fit = TRUE), + error = function(e) { + msg <- paste("*** warning: Survival predict failed for protein", + unique(single_protein$PROTEIN), ":", conditionMessage(e)) + getOption("MSstatsLog")("WARN", msg) + list(fit = rep(NA_real_, .N), se.fit = rep(NA_real_, .N)) + } + ) + list(pr$fit, ifelse(is.na(pr$se.fit), NA_real_, pr$se.fit^2 + sigma2)) + }] + } else { + single_protein[, c("predicted", "imputation_var") := list(NA_real_, NA_real_)] + }
394-399: Validate ANOMALYSCORES before reciprocal; guard missing/invalid valuesPrevent division by zero/negatives/Inf and missing column cases; log and fall back to NA weights. Same as earlier suggestion.
Apply:
- if (all(!is.na(single_protein$ANOMALYSCORES))){ - single_protein$weights = 1 / single_protein$ANOMALYSCORES - } else { - single_protein$weights = NA - } + if ("ANOMALYSCORES" %in% colnames(single_protein) && + all(is.finite(single_protein$ANOMALYSCORES))) { + if (any(single_protein$ANOMALYSCORES <= 0)) { + msg <- paste("*** warning: Invalid (<=0) anomaly scores for protein", + unique(single_protein$PROTEIN), "- weights not applied") + getOption("MSstatsLog")("WARN", msg) + single_protein$weights <- NA_real_ + } else { + single_protein$weights <- 1 / single_protein$ANOMALYSCORES + } + } else { + single_protein$weights <- NA_real_ + }R/utils_groupcomparison.R (1)
139-147: Validate Variance before computing weights (duplicate of prior ask)Guard against zero/negative/Inf variances to avoid invalid weights and model errors.
Apply:
- if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){ - weight_input = 1/input$Variance + if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){ + if (any(input$Variance <= 0 | is.infinite(input$Variance), na.rm = TRUE)) { + msg = paste("*** warning: Invalid variance values detected for protein", + unique(input$Protein), "- weights will not be applied") + getOption("MSstatsLog")("WARN", msg) + weight_input = NULL + } else { + weight_input = 1/input$Variance + } } else { weight_input = NULL }
🧹 Nitpick comments (11)
R/utils_output.R (1)
33-33: Use a named argument for the new 7th parameter in the example.
Keeps the example resilient to future signature shifts and clearer for users.Apply:
-#' summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE, 100) +#' summarized = MSstatsSummarizeWithSingleCore(input, method, impute, cens, FALSE, TRUE, aft_iterations = 100)Please confirm the exported name is indeed
aft_iterations.R/dataProcess.R (4)
36-37: Clarify requirement wording for anomaly detection + linear summarizationSuggest making it explicit that anomaly-derived weights are only supported with summaryMethod = "linear", and that TMP will ignore/anonymize anomaly weights.
Apply:
-#' which is robust estimation method. "linear" uses linear mixed model. If -#' anomaly detection algorithm is performed, "linear" must be used. +#' which is a robust estimation method. "linear" uses a (mixed) linear model. +#' When anomaly detection is used (weights present), summaryMethod must be "linear"; +#' TMP does not apply anomaly-derived weights.
289-290: Examples align with new signaturesExamples compile with aft_iterations in place; consider adding one example that sets equal_variances = FALSE.
Also applies to: 355-356
394-399: Optional: normalize weights to mean 1 to stabilize optimizationIf the downstream model uses observation-level weights, scaling weights to mean 1 can improve numerical stability without changing estimates.
Example:
w <- 1 / single_protein$ANOMALYSCORES single_protein$weights <- w / mean(w, na.rm = TRUE)
333-334: R CMD check NOTE noise: declare globalsConsider adding LABEL, n_obs, n_obs_run, censored, predicted, imputation_var to the NULL globals list to suppress NOTES.
man/MSstatsSummarizeSingleLinear.Rd (2)
27-28: Grammar fix in argument description“observation are” → “observations are”.
Apply in the roxygen source (R/dataProcess.R):
-#' @param equal_variances if TRUE, observation are assumed to be homoskedastic +#' @param equal_variances if TRUE, observations are assumed to be homoskedastic
36-51: Docs match new signature; consider documenting weights behaviorSince anomaly-derived weights are now used in linear summarization, add a sentence in Description or Details noting how weights are interpreted (e.g., inverse anomaly scores, expected to be positive/finite).
R/utils_groupcomparison.R (4)
139-147: Optional: stabilize weights and avoid silent row drops
- Extremely small variances can dominate fits; scale weights to mean=1 and cap extremes.
- If any NA weights remain, lm/lmer silently drop rows; consider warning with counts and, if many are dropped, disabling weights.
Example:
- } else { - weight_input = 1/input$Variance + } else { + w <- 1/input$Variance + invalid <- !is.finite(w) | w <= 0 + if (any(invalid)) { + getOption("MSstatsLog")("WARN", + paste("*** warning:", sum(invalid), "observations have invalid weights and will be dropped")) + w[invalid] <- NA_real_ + } + # stabilize + w <- w / mean(w, na.rm = TRUE) + # optional cap + w <- pmin(w, quantile(w, 0.99, na.rm = TRUE)) + weight_input <- w }
148-152: Use scalar OR (||) to avoid vectorized logic surprisesUse || for single boolean flags.
- if (!has_tech_replicates | is_single_subject) { + if (!has_tech_replicates || is_single_subject) {
139-147: Potential mismatch between modeled data and downstream checks when weights drop rowsIf NA/invalid weights cause row exclusion, .getAllComparisons() still uses the original input to assess empty conditions and build contrasts. Consider passing the actual model frame rows (or a filtered input aligned to weights) into .getAllComparisons() to avoid inconsistencies.
Would you like me to draft a minimal change that derives the model frame with model.frame(full_fit) and threads it through to .getAllComparisons()?
Also applies to: 280-296
466-479: Parallel workers: pre-load namespaces for stability and perfExporting functions works, but pre-loading packages on workers avoids first-call overhead and surprises with S3/S4 methods.
cl = parallel::makeCluster(numberOfCores) + parallel::clusterEvalQ(cl, { + library(lme4) + library(data.table) + # library(MSstats) # optional if running from installed package + NULL + }) parallel::clusterExport(cl, c("MSstatsGroupComparisonSingleProtein", "contrast_matrix", "repeated", "groups", "samples_info", "save_fitted_models", "has_imputed"), envir = function_environment)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
R/dataProcess.R(15 hunks)R/groupComparison.R(3 hunks)R/utils_censored.R(1 hunks)R/utils_groupcomparison.R(4 hunks)R/utils_output.R(3 hunks)man/MSstatsSummarizationOutput.Rd(2 hunks)man/MSstatsSummarizeSingleLinear.Rd(2 hunks)man/MSstatsSummarizeSingleTMP.Rd(3 hunks)man/MSstatsSummarizeWithMultipleCores.Rd(2 hunks)man/MSstatsSummarizeWithSingleCore.Rd(3 hunks)man/dataProcess.Rd(3 hunks)src/linear_summary.cpp(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- R/utils_censored.R
- man/MSstatsSummarizeWithMultipleCores.Rd
- man/dataProcess.Rd
- man/MSstatsSummarizationOutput.Rd
- man/MSstatsSummarizeWithSingleCore.Rd
- man/MSstatsSummarizeSingleTMP.Rd
- R/groupComparison.R
- src/linear_summary.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (10)
R/utils_output.R (1)
13-16: Docs return block is concise and accurate.
Matches the actual return structure of MSstatsSummarizationOutput.R/dataProcess.R (6)
131-132: API change looks good; params propagated to validationAdding aft_iterations to the signature and passing colnames(raw) to .checkDataProcessParams is consistent. No action needed.
Also applies to: 142-144
480-500: Nice: convergence handling for TMP imputationCatches convergence warnings and sets predicted to NA; consistent with robust workflows.
209-212: Threading aft_iterations end-to-end looks correctSignatures updated and forwarded through both parallel and single-core paths.
Also applies to: 256-261, 294-295
205-206: Note: ensure parallel imports cover parApply variants usedThe @importFrom tag lists makeCluster/parLapply/stopCluster/clusterExport; OK as is.
422-427: Good: use vcov(fit) and pass cov_mat to get_linear_summaryThis aligns with variance propagation into DA.
366-387: No changes needed for column naming consistency
The code defines the logicalcensoredflag and then creates a numericcenindicator (0for censored,1otherwise) for use in the Surv model. Both names coexist by design and are correctly applied. (rdrr.io)R/utils_groupcomparison.R (3)
89-93: LGTM: signature formatting onlyNo functional change; keeping params stable is good for API compatibility.
168-169: LGTM: weighted lm for single-subject time-courseAppropriate to thread weights through this branch.
505-506: LGTM: single-core signature formattingNo behavior change; keeps API aligned with multi-core path.
| survival = single_protein[, c(cols, "predicted"), with = FALSE] | ||
| } else { | ||
| survival = single_protein[, cols, with = FALSE] | ||
| survival[, predicted := NA] | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Preserve and return imputation variance with survival output
imputation_var is computed but dropped from survival; include it for downstream DA that uses weights/variance.
Apply:
- survival = single_protein[, c(cols, "predicted"), with = FALSE]
+ survival = single_protein[, c(cols, "predicted", "imputation_var"), with = FALSE]
@@
- survival = single_protein[, cols, with = FALSE]
- survival[, predicted := NA]
+ survival = single_protein[, cols, with = FALSE]
+ survival[, `:=`(predicted = NA_real_, imputation_var = NA_real_)]Also applies to: 429-432
🤖 Prompt for AI Agents
In R/dataProcess.R around lines 388-393, the imputation_var column is computed
but not included in the survival data.frame; update both branches so survival
includes imputation_var (e.g., select cols plus "predicted" and "imputation_var"
in the first branch, and cols plus "imputation_var" in the else branch while
still adding predicted := NA), and make the identical change at lines ~429-432
to ensure imputation variance is preserved and returned for downstream DA
weighting/variance use.
| # fit = try(, silent = TRUE) | ||
| fit = .fitLinearModel(single_protein, is_single_feature, | ||
| is_labeled = label, equal_variances) | ||
|
|
||
| if (inherits(fit, "try-error")) { | ||
| msg = paste("*** error : can't fit the model for ", unique(single_protein$PROTEIN)) | ||
| msg = paste("*** error : can't fit the model for ", | ||
| unique(single_protein$PROTEIN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore try/tryCatch around .fitLinearModel; current error check is ineffective
fit is not wrapped in try, yet you test inherits(fit, "try-error"). A model failure will raise and skip the handler. Wrap the call.
Apply:
- # fit = try(, silent = TRUE)
- fit = .fitLinearModel(single_protein, is_single_feature,
- is_labeled = label, equal_variances)
+ fit <- try(
+ .fitLinearModel(single_protein, is_single_feature,
+ is_labeled = label, equal_variances),
+ silent = TRUE
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # fit = try(, silent = TRUE) | |
| fit = .fitLinearModel(single_protein, is_single_feature, | |
| is_labeled = label, equal_variances) | |
| if (inherits(fit, "try-error")) { | |
| msg = paste("*** error : can't fit the model for ", unique(single_protein$PROTEIN)) | |
| msg = paste("*** error : can't fit the model for ", | |
| unique(single_protein$PROTEIN)) | |
| fit <- try( | |
| .fitLinearModel(single_protein, is_single_feature, | |
| is_labeled = label, equal_variances), | |
| silent = TRUE | |
| ) | |
| if (inherits(fit, "try-error")) { | |
| msg = paste("*** error : can't fit the model for ", | |
| unique(single_protein$PROTEIN)) | |
| ... |
🤖 Prompt for AI Agents
In R/dataProcess.R around lines 410-416, the call to .fitLinearModel is not
wrapped in try/tryCatch yet the code tests inherits(fit, "try-error"), so model
failures will raise and skip the handler; wrap the .fitLinearModel call in
try(..., silent = TRUE) or use tryCatch to catch errors and return an object
that can be detected (e.g., assign the error to fit with class "try-error" or a
sentinel), then keep the existing inherits(fit, "try-error") branch to handle
failures and log the message as before.
| .fitModelForGroupComparison = function(input, repeated, | ||
| is_single_subject, has_tech_replicates) { | ||
|
|
||
| if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){ | ||
| weight_input = 1/input$Variance | ||
| } else { | ||
| weight_input = NULL | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Sanity-check: tests assume full_fit$weights exists
For lm this holds; for lmer use weights(full_fit). Ensure tests only assert on lm path or adapt accordingly.
Run:
🏁 Script executed:
#!/bin/bash
# Locate tests that access full_fit$weights and confirm they don't assume lmer objects
rg -nP -C2 'full_fit\\$weights|weights\\(full_fit\\)' inst testsLength of output: 0
🏁 Script executed:
#!/bin/bash
# List directory structure to locate test folders
fd -t d .
# Search for direct references to full_fit$weights in test scripts
rg -n 'full_fit\$weights' -C2 --glob '*test*.R'
# Search for calls to weights(full_fit) in test scripts
rg -n 'weights\s*\(\s*full_fit\s*\)' -C2 --glob '*test*.R'Length of output: 871
Adapt tests to handle weights in lmer models
Inst/tinytest/test_utils_groupComparison_model.R (lines 14, 17) reference result$full_fit$weights, which exists for lm fits but not for lmer. Either assign full_fit$weights <- weights(full_fit) in the mixed-effects branch or update the tests to call weights(full_fit) only on the lmer path.
🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 139-147, the code sets weight_input
but downstream tests expect result$full_fit$weights to exist for lm fits while
lmer fits don't expose that field; in the mixed-effects branch assign the model
weights into the fitted object (e.g. set full_fit$weights <- weights(full_fit)
after fitting the lmer model) so result$full_fit$weights is present, or
alternatively update the tests to call weights(full_fit) only for lmer
objects—implement the former by adding that assignment immediately after the
lmer fit.
| full_fit = suppressMessages(try( | ||
| lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), data = input), | ||
| lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), | ||
| data = input, weights=weight_input), | ||
| TRUE | ||
| )) | ||
| df_full = suppressMessages(try( | ||
| lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual, | ||
| lm(ABUNDANCE ~ GROUP + SUBJECT, | ||
| data = input, weights=weight_input)$df.residual, | ||
| TRUE | ||
| )) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Harden df_full computation when lmer path is used
If the lm fallback fails, df_full becomes a try-error and later code treats it as numeric. Provide a safe fallback.
- df_full = suppressMessages(try(
- lm(ABUNDANCE ~ GROUP + SUBJECT,
- data = input, weights=weight_input)$df.residual,
- TRUE
- ))
+ df_full = suppressMessages(try(
+ lm(ABUNDANCE ~ GROUP + SUBJECT,
+ data = input, weights=weight_input)$df.residual,
+ TRUE
+ ))
+ if (inherits(df_full, "try-error")) {
+ df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
+ error = function(e) NA_real_)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| full_fit = suppressMessages(try( | |
| lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), data = input), | |
| lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), | |
| data = input, weights=weight_input), | |
| TRUE | |
| )) | |
| df_full = suppressMessages(try( | |
| lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual, | |
| lm(ABUNDANCE ~ GROUP + SUBJECT, | |
| data = input, weights=weight_input)$df.residual, | |
| TRUE | |
| )) | |
| } | |
| full_fit = suppressMessages(try( | |
| lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), | |
| data = input, weights=weight_input), | |
| TRUE | |
| )) | |
| df_full = suppressMessages(try( | |
| lm(ABUNDANCE ~ GROUP + SUBJECT, | |
| data = input, weights=weight_input)$df.residual, | |
| TRUE | |
| )) | |
| if (inherits(df_full, "try-error")) { | |
| df_full <- tryCatch( | |
| stats::nobs(full_fit) - length(lme4::fixef(full_fit)), | |
| error = function(e) NA_real_ | |
| ) | |
| } |
🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 153 to 163, the df_full assignment
from the lm fallback can produce a "try-error" when the lm call fails but later
code assumes df_full is numeric; update the logic to detect a failed try (e.g.
inherits(..., "try-error") or is.null) and provide a safe numeric fallback (such
as NA_real_ or a computed residual df from model dimensions) before it is used
downstream; ensure the check covers both full_fit and df_full failure cases and
that df_full is explicitly coerced to numeric when successful.
| lme4::lmer(ABUNDANCE ~ GROUP + (1|SUBJECT), | ||
| data = input, weights=weight_input), | ||
| TRUE | ||
| )) | ||
| df_full = suppressMessages(try( | ||
| lm(ABUNDANCE ~ GROUP + SUBJECT, data = input)$df.residual, | ||
| lm(ABUNDANCE ~ GROUP + SUBJECT, | ||
| data = input, weights=weight_input)$df.residual, | ||
| TRUE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Repeat df_full guard in this branch
Mirror the safe fallback after the lm() try, as above.
- df_full = suppressMessages(try(
- lm(ABUNDANCE ~ GROUP + SUBJECT,
- data = input, weights=weight_input)$df.residual,
- TRUE))
+ df_full = suppressMessages(try(
+ lm(ABUNDANCE ~ GROUP + SUBJECT,
+ data = input, weights=weight_input)$df.residual,
+ TRUE))
+ if (inherits(df_full, "try-error")) {
+ df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
+ error = function(e) NA_real_)
+ }🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 174 to 181, the df_full assignment
uses try(... )$df.residual directly which will fail when try returns an error;
mirror the safe fallback used earlier by capturing the try result, check for a
"try-error" (or use inherits(..., "try-error")) and if an error set df_full to
the same safe fallback (e.g., NA or 0) otherwise extract $df.residual; update
this branch to follow the same guard pattern as the lm() try above so df_full is
always defined safely.
| lme4::lmer( | ||
| ABUNDANCE ~ GROUP + (1|SUBJECT) + (1|GROUP:SUBJECT), | ||
| data = input, weights=weight_input), | ||
| TRUE | ||
| )) | ||
| df_full = suppressMessages(try( | ||
| lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, | ||
| data = input)$df.residual, | ||
| data = input, weights=weight_input)$df.residual, | ||
| TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
And here as well: protect df_full and keep behavior consistent
Add the same fallback computation for df_full in the tech-replicate branch.
- df_full = suppressMessages(try(
- lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
- data = input, weights=weight_input)$df.residual,
- TRUE
- ))
+ df_full = suppressMessages(try(
+ lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT,
+ data = input, weights=weight_input)$df.residual,
+ TRUE
+ ))
+ if (inherits(df_full, "try-error")) {
+ df_full <- tryCatch(stats::nobs(full_fit) - length(lme4::fixef(full_fit)),
+ error = function(e) NA_real_)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| lme4::lmer( | |
| ABUNDANCE ~ GROUP + (1|SUBJECT) + (1|GROUP:SUBJECT), | |
| data = input, weights=weight_input), | |
| TRUE | |
| )) | |
| df_full = suppressMessages(try( | |
| lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, | |
| data = input)$df.residual, | |
| data = input, weights=weight_input)$df.residual, | |
| TRUE | |
| # existing try to compute residual df from lm | |
| df_full = suppressMessages(try( | |
| lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, | |
| data = input, weights=weight_input)$df.residual, | |
| TRUE | |
| )) | |
| # fallback if lm() fails | |
| if (inherits(df_full, "try-error")) { | |
| df_full <- tryCatch( | |
| stats::nobs(full_fit) - length(lme4::fixef(full_fit)), | |
| error = function(e) NA_real_ | |
| ) | |
| } |
🤖 Prompt for AI Agents
In R/utils_groupcomparison.R around lines 184 to 192, df_full in the
tech-replicate branch is not protected and lacks the same fallback used
elsewhere; wrap the lm(ABUNDANCE ~ GROUP + SUBJECT + GROUP:SUBJECT, data =
input, weights=weight_input)$df.residual call in suppressMessages(try(...,
TRUE)) and if that returns an error, compute df_full using the identical
fallback logic used in the other branch (e.g., the same nrow/input-based
fallback calculation) so behavior is consistent and failures are handled
gracefully.
| } | ||
| single_protein = input[protein_indices[[i]],] | ||
| MSstatsSummarizeSingleLinear(single_protein, equal_variance) | ||
| MSstatsSummarizeSingleLinear( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equal_variance is not passed into MSstatsSummarizeSingleLinear
| single_protein = input[protein_indices[[protein_id]],] | ||
| summarized_result = MSstatsSummarizeSingleLinear(single_protein, | ||
| equal_variance) | ||
| summarized_result = MSstatsSummarizeSingleLinear( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equal_variance is not passed into MSstatsSummarizeSingleLinear
Motivation and Context
Feature branch for MSstats+ adding weights to differential analysis if anomaly detection model used in the converter. Includes vignettes and tests.
Changes
Testing
Please describe any unit tests you added or modified to verify your changes.
Checklist Before Requesting a Review
Summary by CodeRabbit
New Features
Documentation
Tests
Style