-
Notifications
You must be signed in to change notification settings - Fork 42
fix: stabilize DE Spearman correlation floats #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Cast DE fold-change columns to Float64 at correlation time and add a regression test for mixed Float32/Float64 inputs.
Summary of ChangesHello @giovp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical stability issue in the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug in the Spearman correlation calculation when dealing with mixed float types by casting the columns to Float64. It also adds a regression test. The fix itself is correct and robust. However, the new regression test is flawed as it doesn't correctly set up the mixed-float-type scenario it intends to test, and its assertions are weak. I've provided a suggestion to improve the test to ensure it's effective.
| def test_de_spearman_lfc_mixed_float_types() -> None: | ||
| """Regression test: DESpearmanLFC handles mixed Float32/Float64 columns.""" | ||
| real_df = pl.DataFrame( | ||
| { | ||
| "target": ["pert1", "pert1", "pert2", "pert2"], | ||
| "feature": ["gene1", "gene2", "gene1", "gene2"], | ||
| "fold_change": [1.5, 2.0, 0.5, 1.2], | ||
| "p_value": [0.01, 0.02, 0.03, 0.04], | ||
| "fdr": [0.01, 0.02, 0.03, 0.04], | ||
| } | ||
| ) | ||
|
|
||
| pred_df = real_df.with_columns( | ||
| pl.col("fold_change").cast(pl.Float64) | ||
| ) | ||
|
|
||
| comparison = DEComparison( | ||
| real=DEResults(real_df, name="real"), | ||
| pred=DEResults(pred_df, name="pred"), | ||
| ) | ||
|
|
||
| result = DESpearmanLFC(fdr_threshold=0.05)(comparison) | ||
|
|
||
| assert isinstance(result, dict) | ||
| assert all(isinstance(value, (int, float)) for value in result.values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regression test is intended to check for issues with mixed float types, but it doesn't seem to set up the test data correctly to achieve this. Additionally, the assertions could be stronger.
-
Incorrect test setup: The
DEResultsclass's__post_init__method casts thefold_changecolumn topl.Float32for bothrealandpreddataframes. This means the mixedFloat32/Float64scenario described in the PR is not actually being tested. Both dataframes will haveFloat32columns. -
Weak assertions: The test only asserts that the result is a dictionary of floats/ints. It would be more robust to assert the actual correlation values, which should be
1.0in this case for both perturbations since thefold_changevalues are identical.
Here is a suggested improvement to the test to correctly simulate the mixed-type scenario and add stronger assertions:
def test_de_spearman_lfc_mixed_float_types() -> None:
"""Regression test: DESpearmanLFC handles mixed Float32/Float64 columns."""
base_df = pl.DataFrame(
{
"target": ["pert1", "pert1", "pert2", "pert2"],
"feature": ["gene1", "gene2", "gene1", "gene2"],
"fold_change": [1.5, 2.0, 0.5, 1.2],
"p_value": [0.01, 0.02, 0.03, 0.04],
"fdr": [0.01, 0.02, 0.03, 0.04],
}
)
# DEResults casts to Float32 by default
real_res = DEResults(base_df.clone(), name="real")
# Create a pred result and then manually cast its fold_change to Float64
# to simulate the mixed-type scenario.
pred_res = DEResults(base_df.clone(), name="pred")
pred_res.data = pred_res.data.with_columns(
pl.col(pred_res.fold_change_col).cast(pl.Float64)
)
comparison = DEComparison(
real=real_res,
pred=pred_res,
)
result = DESpearmanLFC(fdr_threshold=0.05)(comparison)
expected = {"pert1": 1.0, "pert2": 1.0}
assert result == expected|
@giovp thanks! |
Cast DE fold-change columns to Float64 at correlation time and add a regression test for mixed Float32/Float64 inputs.
With latest
cell-evalandpython=3.12thede_spearman_lfc_sigcalculation failed