Skip to content

Making plots better #26

@tlscherer

Description

@tlscherer

Here's a 2018 book based around ggplot
https://rkabacoff.github.io/datavis/

Here's Wickham's 2016 book based on his dissertation. This is mostly walking through the design philosophy of ggplot and why he built a tool to encourage you to think about graphics in the right way
http://www.ievbras.ru/ecostat/Kiril/R/Biblio_N/R_Eng/Wickham2016.pdf

Here's a chapter on graphics in the R for data science book
https://r4ds.had.co.nz/graphics-for-communication.html

That will help with the coding details of plots but what you really need is guidance on how to present. Tom and I learned it through brute force watching hundreds of terrible presentations with some scattered good ones.

Tom and I will rack our brain about cites on presenting statistical results for an audience.

Let me give some quick guidance off the top of my head
-Each plot makes an argument. One plot, one argument. For each plot I want you guys to think through exactly what argument you want that plot to make.
-Each plot should have as much detail as necessary to make that argument but no more.
-The only exception to the above is when there are simple extra concerns you might be able to answer visually and intensively.

So for example, if your argument is that the mean of this group is higher than that group, then maybe you want a bar plot. But the first question might be, are these means statistically distinguishable, so might add confidence intervals. If you're presenting a trend line fit to data, the first question might be what's the underlying support of the data (e.g. is there only one or two data points out here where the line dips low), so you decide to plot the raw points underneath the line.

Some issues with the plots recently
-Poor visually vocabulary. Bars, labels, and colors are introduced in one plot as meaning one thing, and then resused to mean something completely different in the next plot. It's the equivalent of defining a new work "asdasd" and saying it means sky, and then two sentences later, using it to mean front door. Like in music, or movies, or art, or a conversation, you are agreeing with the audience on a set of rules that they expect to hold for some amount of time. If you change those quickly, you piss off your audience.
-Hard to read, constantly changing labels. Constantly turning my head sideways to try to see if a axis label means what I think it means, long variable labels with 3 pieces of information shoved into the same string.
-Plots that aren't self explanatory. Someone should be able to look at our plot and know exactly what it means, even if they're not familiar with all the underlying questions and refferants. If you have to stand there and explain a plot, then it's a failure.
-Including unlike things in the same plot. Sometimes we're including side by side our results alongside other people's results without any clear markings to tell the difference. If 3 bars are our runs, and a 4th is the original paper there needs to be an obvious break there because those aren't the same thing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions