Skip to content

This repository is the Python version of the FriendsDontLetFriends project, originally implemented in R. The project focuses on data analysis and visualization using Python libraries such as Pandas, Seaborn, and Matplotlib.

Notifications You must be signed in to change notification settings

dzhao2019/FriendsDontLetFriends-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Friends Don't Let Friends Make Bad Graphs (Python Version)

This repository is the Python version of the FriendsDontLetFriends project, originally implemented in R. The project focuses on data analysis and visualization using Python libraries such as Pandas, Seaborn, and Matplotlib.

Introduction to FriendsDontLetFriends

FriendsDontLetFriends is an R-based project created by CX Li (email: cxli233@gmail.com) that focuses on comprehensive data analysis and visualization. The project exemplifies the best practices in data science, leveraging the power of R and its extensive ecosystem of packages to provide insightful analyses and compelling visualizations. It includes a series of R Markdown files (.rmd) that guide users through various data manipulation and visualization tasks. The repository is a valuable resource for data scientists, analysts, and anyone interested in learning how to effectively work with data in R.

For more details and to explore the original project, visit the FriendsDontLetFriends GitHub repository.

Python Version of FriendsDontLetFriends

Table of Contents

  1. Means Separation
  2. Small Sample Sizes
  3. Unidirectional Data
  4. Multi-factorial Experiment
  5. Reordering Rows & Columns for Heatmap
  6. Checking outliers for Heatmap
  7. Don't Forget to Check Data Range
  8. Trying Different Layouts for Network Graphs
  9. Position-based Visualizations vs. Length-based Visualizations
  10. Pie Chart
  11. Concentric Donuts
  12. Choosing Colors: Red/Green & Rainbow Color Scales
  13. Reordering Stacked Bar Plot

1. Means Separation

Bar plot vs. Box plot vs. Dot plot Mean separation plots are frequently used in scientific research to illustrate group differences. These plots compare two or more groups containing multiple observations and can highlight variations in their means, variances, and distributions. The main objective of these visualizations is to represent both the data's central tendency (mean) and dispersion (spread).

No Bar Plots for Means Separation

In this example, the bar plot, box plot, and swarm plot show that while the two groups have similar means and standard deviations, their distributions differ significantly. This raises the question, "Are they truly the same?" It's a reminder to avoid using bar plots alone for mean separation and to consider alternative plots that provide a fuller picture of the data.

2. Small Sample Sizes

Violin plots or any smoothed distribution curves are unreliable for small sample sizes. When sample sizes are small, distributions and quartiles can vary widely, even if the data points are similar. These measures only become meaningful when sample sizes are larger, generally stabilizing when n exceeds 50.

Violin_plot_for_small_n

Violet plot vs. Box plot vs. Strip plot

3. Unidirectional Data

Using color gradients in data visualization requires careful consideration. The darkest and lightest colors in a gradient should have specific meanings, such as representing the maximum, minimum, or zero. A common mistake in visualizing data is applying arbitrary colors to values, which can mislead the viewer. This error is as misleading as having the longest bar in a bar chart not represent the largest value.

Divergent_gradient_for_unidirectional_data

4. Multi-factorial Experiment

Bar plot vs. Dot plot

5. Reordering Rows & Columns for Heatmap

6. Checking outliers for Heatmap

7. Don't Forget to Check Data Range

8. Trying Different Layouts for Network Graphs

9. Position-based Visualizations vs. Length-based Visualizations

10. Pie Chart

Pie charts vs. Donut charts vs. Stacked bars

11. Concentric Donuts

Concentric Donut plot vs. Stacked Bar plot

12. Choosing Colors: Red/Green & Rainbow Color Scales

13. Reordering Stacked Bar Plot

14. Mean Separation

About

This repository is the Python version of the FriendsDontLetFriends project, originally implemented in R. The project focuses on data analysis and visualization using Python libraries such as Pandas, Seaborn, and Matplotlib.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages