This repository is the Python version of the FriendsDontLetFriends project, originally implemented in R. The project focuses on data analysis and visualization using Python libraries such as Pandas, Seaborn, and Matplotlib.
FriendsDontLetFriends is an R-based project created by CX Li (email: cxli233@gmail.com) that focuses on comprehensive data analysis and visualization. The project exemplifies the best practices in data science, leveraging the power of R and its extensive ecosystem of packages to provide insightful analyses and compelling visualizations. It includes a series of R Markdown files (.rmd) that guide users through various data manipulation and visualization tasks. The repository is a valuable resource for data scientists, analysts, and anyone interested in learning how to effectively work with data in R.
For more details and to explore the original project, visit the FriendsDontLetFriends GitHub repository.
- Means Separation
- Small Sample Sizes
- Unidirectional Data
- Multi-factorial Experiment
- Reordering Rows & Columns for Heatmap
- Checking outliers for Heatmap
- Don't Forget to Check Data Range
- Trying Different Layouts for Network Graphs
- Position-based Visualizations vs. Length-based Visualizations
- Pie Chart
- Concentric Donuts
- Choosing Colors: Red/Green & Rainbow Color Scales
- Reordering Stacked Bar Plot
Bar plot vs. Box plot vs. Dot plot Mean separation plots are frequently used in scientific research to illustrate group differences. These plots compare two or more groups containing multiple observations and can highlight variations in their means, variances, and distributions. The main objective of these visualizations is to represent both the data's central tendency (mean) and dispersion (spread).
In this example, the bar plot, box plot, and swarm plot show that while the two groups have similar means and standard deviations, their distributions differ significantly. This raises the question, "Are they truly the same?" It's a reminder to avoid using bar plots alone for mean separation and to consider alternative plots that provide a fuller picture of the data.
Violin plots or any smoothed distribution curves are unreliable for small sample sizes. When sample sizes are small, distributions and quartiles can vary widely, even if the data points are similar. These measures only become meaningful when sample sizes are larger, generally stabilizing when n exceeds 50.
Violet plot vs. Box plot vs. Strip plot
Using color gradients in data visualization requires careful consideration. The darkest and lightest colors in a gradient should have specific meanings, such as representing the maximum, minimum, or zero. A common mistake in visualizing data is applying arbitrary colors to values, which can mislead the viewer. This error is as misleading as having the longest bar in a bar chart not represent the largest value.
Bar plot vs. Dot plot
Pie charts vs. Donut charts vs. Stacked bars
Concentric Donut plot vs. Stacked Bar plot


