Remove dupes (expands on other PR, don't merge)#3
Open
theelk801 wants to merge 2 commits intoMediaArchaeologyLab:masterfrom
Open
Remove dupes (expands on other PR, don't merge)#3theelk801 wants to merge 2 commits intoMediaArchaeologyLab:masterfrom
theelk801 wants to merge 2 commits intoMediaArchaeologyLab:masterfrom
Conversation
Member
|
Out of curiosity, do you have an example item where "The same is done if one row's entry is a substring of the other"? |
Contributor
Author
|
Yeah, an example would be this one. One of them is "Romancing the Throne" and the other is "King’s Quest II: Romancing the Throne", so one string is a substring of the other and the superstring can replace it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On this one I've taken each instance of an accession number appearing exactly twice and combined both records as follows:
Given two rows with the same accession number, if one row has an entry in a column the other doesn't then that row's entry in that column is copied into the other row. The same is done if one row's entry is a substring of the other. If the entries can't be copied cleanly like this, they maintain the two separate values. No rows are removed unless they were missing an accession number.
This PR probably doesn't need to be merged, it's just a handy way to see the differences created by this process.