Student question: Discrepancy between manually calculated CI vs `estimate_means()`

Flagged by Sik Tsang 

A reprex using penguins data

``` r
library(tidyverse)
#> Warning: package 'purrr' was built under R version 4.3.3
#> Warning: package 'lubridate' was built under R version 4.3.3
library(easystats)
#> Warning: package 'easystats' was built under R version 4.3.3
#> # Attaching packages: easystats 0.7.4 (red = needs update)
#> ✔ bayestestR  0.15.2   ✔ correlation 0.8.7 
#> ✖ datawizard  1.0.1    ✔ effectsize  1.0.0 
#> ✔ insight     1.1.0    ✔ modelbased  0.10.0
#> ✔ performance 0.13.0   ✔ parameters  0.24.2
#> ✔ report      0.6.1    ✔ see         0.11.0
#> 
#> Restart the R-Session and update packages with `easystats::easystats_update()`.
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.3.3

## Here, I will test for species differences in bill_length_mm using a lm

fit_penguins <- lm(bill_length_mm ~ species, data = penguins)

estimate_means(fit_penguins)
#> We selected `by=c("species")`.
#> Estimated Marginal Means
#> 
#> species   |  Mean |   SE |         95% CI | t(339)
#> --------------------------------------------------
#> Adelie    | 38.79 | 0.24 | [38.32, 39.27] | 161.05
#> Chinstrap | 48.83 | 0.36 | [48.13, 49.54] | 136.05
#> Gentoo    | 47.50 | 0.27 | [46.98, 48.03] | 178.00
#> 
#> Variable predicted: bill_length_mm
#> Predictors modulated: species

# Try out options - these are all the same
# estimate_means(fit_penguins, estimate = "average")
# estimate_means(fit_penguins, estimate = "population")
# estimate_means(fit_penguins, estimate = "specific")
# estimate_means(fit_penguins, estimate = "typical")

# Try out options - these are all the same
# estimate_means(fit_penguins, backend = "emmeans")
# estimate_means(fit_penguins, backend = "marginaleffects")

## Here, I will hand calcuate means, SE, CIs

penguins |> 
  group_by(species) |> 
  summarise(mean = mean(bill_length_mm, na.rm = TRUE), 
            n = n(),
            se = sd(bill_length_mm, na.rm = TRUE)/sqrt(n),
            t_statistic = qt(0.975, df = n - 1),
            CI_low = mean-(t_statistic*se),
            CI_high = mean+(t_statistic*se)
  )
#> # A tibble: 3 × 7
#>   species    mean     n    se t_statistic CI_low CI_high
#>   <fct>     <dbl> <int> <dbl>       <dbl>  <dbl>   <dbl>
#> 1 Adelie     38.8   152 0.216        1.98   38.4    39.2
#> 2 Chinstrap  48.8    68 0.405        2.00   48.0    49.6
#> 3 Gentoo     47.5   124 0.277        1.98   47.0    48.1
```

<sup>Created on 2025-04-01 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Student question: Discrepancy between manually calculated CI vs `estimate_means()` #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Student question: Discrepancy between manually calculated CI vs estimate_means() #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Student question: Discrepancy between manually calculated CI vs `estimate_means()` #19