Skip to content

Comments

Remove dupes (expands on other PR, don't merge)#3

Open
theelk801 wants to merge 2 commits intoMediaArchaeologyLab:masterfrom
theelk801:removeDupes
Open

Remove dupes (expands on other PR, don't merge)#3
theelk801 wants to merge 2 commits intoMediaArchaeologyLab:masterfrom
theelk801:removeDupes

Conversation

@theelk801
Copy link
Contributor

On this one I've taken each instance of an accession number appearing exactly twice and combined both records as follows:

Given two rows with the same accession number, if one row has an entry in a column the other doesn't then that row's entry in that column is copied into the other row. The same is done if one row's entry is a substring of the other. If the entries can't be copied cleanly like this, they maintain the two separate values. No rows are removed unless they were missing an accession number.

This PR probably doesn't need to be merged, it's just a handy way to see the differences created by this process.

@ericmagnuson
Copy link
Member

Out of curiosity, do you have an example item where "The same is done if one row's entry is a substring of the other"?

@theelk801
Copy link
Contributor Author

Yeah, an example would be this one. One of them is "Romancing the Throne" and the other is "King’s Quest II: Romancing the Throne", so one string is a substring of the other and the superstring can replace it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants