fix(pd-converter): Remove QuanInfo column after cleaning up PD data #106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
https://groups.google.com/g/msstats/c/8OrKxfxMxOo - The PD converter does not remove the QuanInfo column from the input data frame.
QuanInfois a PD specific column and is not referenced anywhere else in the code. However, this column causes problems with summarizing multiple PSMs:Workflow:
maxwhere if we have duplicate PSMs with equal max intensities here, we cannot determine which PSM to use to summarize a feature.QuanInfois included, the code may mistaken two duplicate PSMs as being associated with different features here, leading to duplicate PSMs remaining in the input data.proteinSummarization/dataProcesscrashes when there's multiple PSMs of the same feature.I've determined the best solution is to remove
QuanInfofrom the input data frame during conversion.Changes
QuanInfocolumn from the pd input table after cleanup since it is not needed anymore.Testing
Fixed existing unit tests
Checklist Before Requesting a Review