Conversation
Mostly minor refactoring, but making copy of 'data' df is important to avoid SettingWithCopyWarning when changing a column's dtype.
Mutating cols based on their own value is bad practice as the cell is not repeatable... it is better to create a new col w/ the results.
|
I read Sam's presentation and now see that we want to convert the dataframes to 'tidy long form', not just find the quickest / easiest way to calculate 'number of colors' & 'number of description words'. So let me have a look at doing that... your original solution may have been the right approach! |
My prior method was a shortcut that works for this problem statement, but isn't a 'data wrangling' best practice (make you dataframes tidy, eg: 1 measurement per row). Now we show both approaches.
Jupyter autosave... so slow.
|
I'm still happy to merge this if you'd like @L4R5m. I really like the comments, headings, and formatting. The big challenge is that Sam and I were trying to follow a very simple format of 1) read the data 2) wrangle it 3) visualize so that newcomers would be able to follow. I think we need to distill this so that it maps more closely to Sam's solution here https://github.com/samanthacsik/RLadiesSB-RvsPython/blob/main/KEY_Rcode.R. Can we get it down to three simple steps? |
Simplify the notebook to only show one method of analyzing the data. Also added a few more comments and cleaned up formatting.
|
Sounds good, I updated the Notebook to remove one of the analysis methods so now it is just based on the 'tidy' dataframe method. Feel free to edit further if you want to simplify more. |
|
I don't think I can modify the pull request since you forked @L4R5m. I did my best to incorporate your code into key-2. @samanthacsik and @an-bui feel free to give feedback here too. Some notes:
|
Changes to key.ipynb: