Skip to content

Student question: Discrepancy between manually calculated CI vs estimate_means() #19

@fontikar

Description

@fontikar

Flagged by Sik Tsang

A reprex using penguins data

library(tidyverse)
#> Warning: package 'purrr' was built under R version 4.3.3
#> Warning: package 'lubridate' was built under R version 4.3.3
library(easystats)
#> Warning: package 'easystats' was built under R version 4.3.3
#> # Attaching packages: easystats 0.7.4 (red = needs update)
#> ✔ bayestestR  0.15.2   ✔ correlation 0.8.7 
#> ✖ datawizard  1.0.1    ✔ effectsize  1.0.0 
#> ✔ insight     1.1.0    ✔ modelbased  0.10.0
#> ✔ performance 0.13.0   ✔ parameters  0.24.2
#> ✔ report      0.6.1    ✔ see         0.11.0
#> 
#> Restart the R-Session and update packages with `easystats::easystats_update()`.
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.3.3

## Here, I will test for species differences in bill_length_mm using a lm

fit_penguins <- lm(bill_length_mm ~ species, data = penguins)

estimate_means(fit_penguins)
#> We selected `by=c("species")`.
#> Estimated Marginal Means
#> 
#> species   |  Mean |   SE |         95% CI | t(339)
#> --------------------------------------------------
#> Adelie    | 38.79 | 0.24 | [38.32, 39.27] | 161.05
#> Chinstrap | 48.83 | 0.36 | [48.13, 49.54] | 136.05
#> Gentoo    | 47.50 | 0.27 | [46.98, 48.03] | 178.00
#> 
#> Variable predicted: bill_length_mm
#> Predictors modulated: species

# Try out options - these are all the same
# estimate_means(fit_penguins, estimate = "average")
# estimate_means(fit_penguins, estimate = "population")
# estimate_means(fit_penguins, estimate = "specific")
# estimate_means(fit_penguins, estimate = "typical")

# Try out options - these are all the same
# estimate_means(fit_penguins, backend = "emmeans")
# estimate_means(fit_penguins, backend = "marginaleffects")

## Here, I will hand calcuate means, SE, CIs

penguins |> 
  group_by(species) |> 
  summarise(mean = mean(bill_length_mm, na.rm = TRUE), 
            n = n(),
            se = sd(bill_length_mm, na.rm = TRUE)/sqrt(n),
            t_statistic = qt(0.975, df = n - 1),
            CI_low = mean-(t_statistic*se),
            CI_high = mean+(t_statistic*se)
  )
#> # A tibble: 3 × 7
#>   species    mean     n    se t_statistic CI_low CI_high
#>   <fct>     <dbl> <int> <dbl>       <dbl>  <dbl>   <dbl>
#> 1 Adelie     38.8   152 0.216        1.98   38.4    39.2
#> 2 Chinstrap  48.8    68 0.405        2.00   48.0    49.6
#> 3 Gentoo     47.5   124 0.277        1.98   47.0    48.1

Created on 2025-04-01 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions