diff --git a/DESCRIPTION b/DESCRIPTION index 30d582e..58e424c 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: modelbpp Title: Model BIC Posterior Probability -Version: 0.1.5 +Version: 0.1.5.2 Authors@R: c(person(given = "Shu Fai", family = "Cheung", diff --git a/NEWS.md b/NEWS.md index e4288a0..6b27bd6 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,16 @@ +# modelbpp 0.1.5.2 + +## Miscellaneous + +- The default of + `more_fit_measures` of the print + method of `model_set`-class object + was changed to `c("cfi", "rmsea", "srmr)`. + (0.1.5.1) + +- Changed the vignettes to precomputed + Rmarkdown files. (0.1.5.2) + # modelbpp 0.1.5 ## New Features diff --git a/R/print.model_set.R b/R/print.model_set.R index 54bb977..69a59e8 100644 --- a/R/print.model_set.R +++ b/R/print.model_set.R @@ -49,7 +49,7 @@ #' @param more_fit_measures Character #' vector. To be passed to #' [lavaan::fitMeasures()]. Default -#' is `c("cfi", "rmsea")`. Set it to +#' is `c("cfi", "rmsea", "srmr")`. Set it to #' `NULL` to disable printing additional #' fit measures. #' @@ -107,7 +107,7 @@ print.model_set <- function(x, max_models = 20, bpp_target = NULL, target_name = "original", - more_fit_measures = c("cfi", "rmsea"), + more_fit_measures = c("cfi", "rmsea", "srmr"), fit_measures_digits = 3, short_names = FALSE, cumulative_bpp = FALSE, diff --git a/README.md b/README.md index 2ce3803..1d22d3d 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ # modelbpp: Model BIC Posterior Probability -(Version 0.1.5 updated on 2024-09-16, [release history](https://sfcheung.github.io/modelbpp/news/index.html)) +(Version 0.1.5.2 updated on 2025-08-20, [release history](https://sfcheung.github.io/modelbpp/news/index.html)) This package is for assessing model uncertainty in structural equation modeling (SEM) by the BIC posterior diff --git a/man/print.model_set.Rd b/man/print.model_set.Rd index fb2220f..c7ce81c 100644 --- a/man/print.model_set.Rd +++ b/man/print.model_set.Rd @@ -12,7 +12,7 @@ max_models = 20, bpp_target = NULL, target_name = "original", - more_fit_measures = c("cfi", "rmsea"), + more_fit_measures = c("cfi", "rmsea", "srmr"), fit_measures_digits = 3, short_names = FALSE, cumulative_bpp = FALSE, @@ -58,7 +58,7 @@ Used if \code{bpp_target} is not \code{NULL}.} \item{more_fit_measures}{Character vector. To be passed to \code{\link[lavaan:fitMeasures]{lavaan::fitMeasures()}}. Default -is \code{c("cfi", "rmsea")}. Set it to +is \code{c("cfi", "rmsea", "srmr")}. Set it to \code{NULL} to disable printing additional fit measures.} diff --git a/rebuild_vignettes.R b/rebuild_vignettes.R new file mode 100644 index 0000000..1de0817 --- /dev/null +++ b/rebuild_vignettes.R @@ -0,0 +1,9 @@ +# Adapted from https://www.kloppenborg.ca/2021/06/long-running-vignettes/ + +base_dir <- getwd() + +setwd("vignettes/") +knitr::knit("modelbpp.Rmd.original", output = "modelbpp.Rmd") + +setwd(base_dir) + diff --git a/vignettes/graph1-1.png b/vignettes/graph1-1.png index 4f88e09..6fd4d8e 100644 Binary files a/vignettes/graph1-1.png and b/vignettes/graph1-1.png differ diff --git a/vignettes/graph1_df2-1.png b/vignettes/graph1_df2-1.png index a9b67af..f3c128b 100644 Binary files a/vignettes/graph1_df2-1.png and b/vignettes/graph1_df2-1.png differ diff --git a/vignettes/modelbpp.Rmd b/vignettes/modelbpp.Rmd index f32c0f1..ea02412 100644 --- a/vignettes/modelbpp.Rmd +++ b/vignettes/modelbpp.Rmd @@ -1,375 +1,613 @@ ---- -title: "Get Started" -author: "Shu Fai Cheung" -date: "`r Sys.Date()`" -output: - rmarkdown::html_vignette: - number_sections: true -vignette: > - %\VignetteIndexEntry{Get Started} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} -bibliography: references.bib -csl: apa.csl ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>", - fig.width = 6, - fig.height = 6, - fig.align = "center", - fig.path = "" -) -``` - -# Introduction - -This article illustrates how to -use `model_set()` and other functions -from the package -[`modelbpp`](https://sfcheung.github.io/modelbpp/) -to: - -- Fit a set of neighboring models, - each has one more or one less degree - of freedom than the original fitted - model. - -- Compute the BIC posterior probability - (BPP), for each model [@wu_simple_2020]. - -- Use BPP to assess to what extent - each model is supported by the - data, compared to all other models - under consideration. - -# Workflow - -1. Fit an SEM model, the - original model, as usual in `lavaan`. - -2. Call `model_set()` on the output from - Step 1. It will automatically do the - following: - - - Enumerate the neighboring models - of the original model. - - - Fit all the models and compute their - BIC posterior probabilities (BPPs). - -3. Examine the results by: - - - printing the output of `model_set()`, or - - - generating a graph using `model_graph()`. - -# Example - -This is a sample dataset, -`dat_serial_4_weak`, -with four variables: - -```{r} -library(modelbpp) -head(dat_serial_4_weak) -``` - -## Step 1: Fit the Original Model - -Fit this original model, a serial -mediation model, with one direct path, -from `x` to `y`: - -```{r} -library(lavaan) -mod1 <- -" -m1 ~ x -m2 ~ m1 -y ~ m2 + x -" -fit1 <- sem(mod1, dat_serial_4_weak) -``` - -This the summary: - -```{r} -summary(fit1, - fit.measures = TRUE) -``` - -```{r echo = FALSE} -tmp <- fitMeasures(fit1) -fit1_cfi <- unname(tmp["cfi"]) -fit1_rmsea <- unname(tmp["rmsea"]) -``` - -The fit is acceptable, though the RMSEA -is marginal (CFI = `r formatC(fit1_cfi, 3, format = "f")`, -RMSEA = `r formatC(fit1_rmsea, 3, format = "f")`). - -## Step 2: Call `model_set()` - -Use `model_set()` to find the -neighboring models differ -from the target model by one on model -degrees of freedom, fit them, and compute -the BPPs: - -```{r results = FALSE} -out1 <- model_set(fit1) -``` - -## Step 3: Examine the Results - -To examine the results, just print -the output: - -```{r} -out1 -``` - -```{r echo = FALSE} -out1_bpp <- out1$bpp -out1_bpp_2 <- sort(out1_bpp, decreasing = TRUE)[2] -``` - -The total number of models examined, -including the original model, is `r length(out1$models)`. - -(Note: The total number of models -was 9 in previous version. Please -refer to the Note in the printout for -the changes.) - -The BIC posterior probabilities -(BPPs) suggest that -the original model is indeed the most -probable model based on BPP. However, -the model with the direct path -dropped, `drop: y~x`, only -has slightly lower BPP -(`r formatC(out1_bpp_2, 3, format = "f")`) - -This suggests that, with equal prior -probabilities [@wu_simple_2020], the -support for the model with the direct -and without the direct path have similar -support from the data based on BPP. - -Alternatively, we can use `model_graph()` -to visualize the BPPs and model relations -graphically: - -```{r graph1, fig.height = 8, fig.width = 8, eval = FALSE} -graph1 <- model_graph(out1) -plot(graph1) -``` - -![](graph1-1.png) - -Each node (circle) represents one model. -The larger the BPP, the larger the node. - -The arrow points from a simpler -model (a model with larger model *df*) -to a more complicated model (a model -with smaller model *df*). If two models -are connected by an arrow, then -one model can be formed from another model -by adding or removing one free parameter -(e.g., adding or removing one path). - -## Repeat Step 2 with User Prior - -In real studies, not all models are -equally probable before having data -(i.e., not all models have equal -prior probabilities). A researcher -fits the original model because - -- its - prior probability is higher than other - models, at least other neighboring - models (otherwise, it is not worthy - of collecting data assess thi original - model), but - -- the prior probability - is not so high to eliminate the need - for collecting data to see how much it is - supported by data. - -Suppose we decide that the prior probability -of the original model is .50: probable, but -still needs data to decide whether it is -empirically supported - -This can be done by setting `prior_sem_out` -to the desired prior probability when calling -`model_set()`: - -```{r results = FALSE} -out1_prior <- model_set(fit1, - prior_sem_out = .50) -``` - -The prior probabilities of all other -models are equal. Therefore, with -nine models and the prior of the target -model being .50, the prior probability -of the other eight model is (1 - .50) / 8 -or .0625. - -This is the printout: - -```{r} -out1_prior -``` - -If the prior of the target is set to .50, -then, taking into account both the prior -probabilities and the data, the target -model is strongly supported by the data. - -This is the output of `model_graph()`: - -```{r out1_prior, fig.height = 8, fig.width = 8, eval = FALSE} -graph1_prior <- model_graph(out1_prior) -plot(graph1_prior) -``` - -![](out1_prior-1.png) - -# Advanced Options - -## More Neighboring Models - -If desired, we can enumerate models -"farther away" from the target model. -For example, we can set the -maximum difference -in model *df* to 2, to include models -having two more or two less *df* than -the original model: - -```{r results = FALSE} -out1_df2 <- model_set(fit1, - df_change_add = 2, - df_change_drop = 2) -``` - -This is the printout. By default, when there -are more than 20 models, only the top 20 -models on BPP will be printed: - -```{r} -out1_df2 -``` - -The number of models examined, including -the original model, is `r length(out1_df2$models)`. - -This is the output of `model_graph()`: - -```{r graph1_df2, fig.height = 8, fig.width = 8, eval = FALSE} -graph1_df2 <- model_graph(out1_df2, - node_label_size = .75) -plot(graph1_df2) -``` - -![](graph1_df2-1.png) - -Note: Due to the number of nodes, -`node_label_size` is used to reduce -the size of the labels. - -## Excluding Some Parameters From the Search - -When calling `model_set()`, users can -specify parameters that must be excluded -from the list to be added (`must_not_add`), -or must not be dropped (`must_not_drop`). - -For example, suppose it is well -established that `m1~x` exists and should -not be dropped, we can exclude it -when calling `model_set()` - -```{r results = FALSE} -out1_no_m1_x <- model_set(fit1, - must_not_drop = "m1~x") -``` - -This is the output: - -```{r} -out1_no_m1_x -``` - -The number of models reduced to `r length(out1_df2$models)`. - -This is the plot: - -```{r out1_no_m1_x, ig.height = 8, fig.width = 8, eval = FALSE} -out1_no_m1_x <- model_graph(out1_no_m1_x) -plot(out1_no_m1_x) -``` - -![](out1_no_m1_x-1.png) - -## Models With Constraints - -If the original model has equality -constraints, they will be included in -the search for neighboring models, -by default. That is, removing one -equality constraint between two models -is considered as a model with an -increase of 1 *df*. - -## Recompute BPPs Without Refitting the Models - -Users can examine the impact of the prior -probability of the original model without -refitting the models, by using -the output of `model_set()` as the -input, using the `model_set_out` argument: - -```{r results = FALSE} -out1_new_prior <- model_set(model_set_out = out1, - prior_sem_out = .50) -``` - -The results are identical to calling -`model_set()` with the original `lavaan` -output as the input: - -```{r} -out1_new_prior -``` - -## Many Neighboring Models - -When a model has a lot of free parameters, -the number of neighboring models can be -large and it will take a long time to -fit all of them. Users can enable -parallel processing by setting `parallel` -to `TRUE` when calling `model_set()`. - -## More Options - -Please refer to the help page of -`model_set()` for options available. - -# Further Information - -For further information on other -functions, -please refer to their help pages. - -# References +--- +title: "Get Started" +author: "Shu Fai Cheung" +date: "2025-08-20" +output: + rmarkdown::html_vignette: + number_sections: true +vignette: > + %\VignetteIndexEntry{Get Started} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +bibliography: references.bib +csl: apa.csl +--- + + + +# Introduction + +This article illustrates how to +use `model_set()` and other functions +from the package +[`modelbpp`](https://sfcheung.github.io/modelbpp/) +to: + +- Fit a set of neighboring models, + each has one more or one less degree + of freedom than the original fitted + model. + +- Compute the BIC posterior probability + (BPP), for each model [@wu_simple_2020]. + +- Use BPP to assess to what extent + each model is supported by the + data, compared to all other models + under consideration. + +# Workflow + +1. Fit an SEM model, the + original model, as usual in `lavaan`. + +2. Call `model_set()` on the output from + Step 1. It will automatically do the + following: + + - Enumerate the neighboring models + of the original model. + + - Fit all the models and compute their + BIC posterior probabilities (BPPs). + +3. Examine the results by: + + - printing the output of `model_set()`, or + + - generating a graph using `model_graph()`. + +# Example + +This is a sample dataset, +`dat_serial_4_weak`, +with four variables: + + +``` r +library(modelbpp) +head(dat_serial_4_weak) +#> x m1 m2 y +#> 1 0.09107195 1.2632493 0.7823926 0.3252093 +#> 2 -1.96063838 -0.7745526 0.2002867 -0.6379673 +#> 3 0.20184014 0.2238152 0.2374072 0.4205998 +#> 4 -2.25521708 0.1185732 -0.1727878 0.9320889 +#> 5 -0.15321350 0.1509888 1.1251386 0.6892537 +#> 6 -2.00640303 -0.1595208 -0.1553136 -0.2364792 +``` + +## Step 1: Fit the Original Model + +Fit this original model, a serial +mediation model, with one direct path, +from `x` to `y`: + + +``` r +library(lavaan) +mod1 <- +" +m1 ~ x +m2 ~ m1 +y ~ m2 + x +" +fit1 <- sem(mod1, dat_serial_4_weak) +``` + +This the summary: + + +``` r +summary(fit1, + fit.measures = TRUE) +#> lavaan 0.6-19 ended normally after 1 iteration +#> +#> Estimator ML +#> Optimization method NLMINB +#> Number of model parameters 7 +#> +#> Number of observations 100 +#> +#> Model Test User Model: +#> +#> Test statistic 3.376 +#> Degrees of freedom 2 +#> P-value (Chi-square) 0.185 +#> +#> Model Test Baseline Model: +#> +#> Test statistic 39.011 +#> Degrees of freedom 6 +#> P-value 0.000 +#> +#> User Model versus Baseline Model: +#> +#> Comparative Fit Index (CFI) 0.958 +#> Tucker-Lewis Index (TLI) 0.875 +#> +#> Loglikelihood and Information Criteria: +#> +#> Loglikelihood user model (H0) -216.042 +#> Loglikelihood unrestricted model (H1) -214.354 +#> +#> Akaike (AIC) 446.084 +#> Bayesian (BIC) 464.320 +#> Sample-size adjusted Bayesian (SABIC) 442.212 +#> +#> Root Mean Square Error of Approximation: +#> +#> RMSEA 0.083 +#> 90 Percent confidence interval - lower 0.000 +#> 90 Percent confidence interval - upper 0.232 +#> P-value H_0: RMSEA <= 0.050 0.261 +#> P-value H_0: RMSEA >= 0.080 0.628 +#> +#> Standardized Root Mean Square Residual: +#> +#> SRMR 0.050 +#> +#> Parameter Estimates: +#> +#> Standard errors Standard +#> Information Expected +#> Information saturated (h1) model Structured +#> +#> Regressions: +#> Estimate Std.Err z-value P(>|z|) +#> m1 ~ +#> x 0.187 0.059 3.189 0.001 +#> m2 ~ +#> m1 0.231 0.091 2.537 0.011 +#> y ~ +#> m2 0.341 0.089 3.835 0.000 +#> x 0.113 0.052 2.188 0.029 +#> +#> Variances: +#> Estimate Std.Err z-value P(>|z|) +#> .m1 0.278 0.039 7.071 0.000 +#> .m2 0.254 0.036 7.071 0.000 +#> .y 0.213 0.030 7.071 0.000 +``` + + + +The fit is acceptable, though the RMSEA +is marginal (CFI = 0.958, +RMSEA = 0.083). + +## Step 2: Call `model_set()` + +Use `model_set()` to find the +neighboring models differ +from the target model by one on model +degrees of freedom, fit them, and compute +the BPPs: + + +``` r +out1 <- model_set(fit1) +``` + +## Step 3: Examine the Results + +To examine the results, just print +the output: + + +``` r +out1 +#> +#> Call: +#> model_set(sem_out = fit1) +#> +#> Number of model(s) fitted : 7 +#> Number of model(s) converged : 7 +#> Number of model(s) passed post.check: 7 +#> +#> The models (sorted by BPP): +#> model_df df_diff Prior BIC BPP cfi rmsea srmr +#> original 2 0 0.143 464.320 0.323 0.958 0.083 0.050 +#> drop: y~x 3 -1 0.143 464.340 0.320 0.848 0.129 0.095 +#> add: y~m1 1 1 0.143 465.900 0.147 1.000 0.000 0.018 +#> drop: m2~m1 3 -1 0.143 465.953 0.143 0.800 0.148 0.116 +#> add: m2~x 1 1 0.143 468.575 0.038 0.939 0.142 0.046 +#> drop: m1~x 3 -1 0.143 469.403 0.025 0.695 0.183 0.124 +#> drop: y~m2 3 -1 0.143 473.291 0.004 0.577 0.216 0.133 +#> +#> Note: +#> - BIC: Bayesian Information Criterion. +#> - BPP: BIC posterior probability. +#> - model_df: Model degrees of freedom. +#> - df_diff: Difference in df compared to the original/target model. +#> - To show cumulative BPPs, call print() with 'cumulative_bpp = TRUE'. +#> - At least one model has fixed.x = TRUE. The models are not checked for +#> equivalence. +#> - Since Version 0.1.3.5, the default values of exclude_feedback and +#> exclude_xy_cov changed to TRUE. Set them to FALSE to reproduce +#> results from previous versions. +``` + + + +The total number of models examined, +including the original model, is 7. + +(Note: The total number of models +was 9 in previous version. Please +refer to the Note in the printout for +the changes.) + +The BIC posterior probabilities +(BPPs) suggest that +the original model is indeed the most +probable model based on BPP. However, +the model with the direct path +dropped, `drop: y~x`, only +has slightly lower BPP +(0.320) + +This suggests that, with equal prior +probabilities [@wu_simple_2020], the +support for the model with the direct +and without the direct path have similar +support from the data based on BPP. + +Alternatively, we can use `model_graph()` +to visualize the BPPs and model relations +graphically: + + +``` r +graph1 <- model_graph(out1) +plot(graph1) +``` + +
+plot of chunk graph1 +

plot of chunk graph1

+
+ +Each node (circle) represents one model. +The larger the BPP, the larger the node. + +The arrow points from a simpler +model (a model with larger model *df*) +to a more complicated model (a model +with smaller model *df*). If two models +are connected by an arrow, then +one model can be formed from another model +by adding or removing one free parameter +(e.g., adding or removing one path). + +## Repeat Step 2 with User Prior + +In real studies, not all models are +equally probable before having data +(i.e., not all models have equal +prior probabilities). A researcher +fits the original model because + +- its + prior probability is higher than other + models, at least other neighboring + models (otherwise, it is not worthy + of collecting data assess thi original + model), but + +- the prior probability + is not so high to eliminate the need + for collecting data to see how much it is + supported by data. + +Suppose we decide that the prior probability +of the original model is .50: probable, but +still needs data to decide whether it is +empirically supported + +This can be done by setting `prior_sem_out` +to the desired prior probability when calling +`model_set()`: + + +``` r +out1_prior <- model_set(fit1, + prior_sem_out = .50) +``` + +The prior probabilities of all other +models are equal. Therefore, with +nine models and the prior of the target +model being .50, the prior probability +of the other eight model is (1 - .50) / 8 +or .0625. + +This is the printout: + + +``` r +out1_prior +#> +#> Call: +#> model_set(sem_out = fit1, prior_sem_out = 0.5) +#> +#> Number of model(s) fitted : 7 +#> Number of model(s) converged : 7 +#> Number of model(s) passed post.check: 7 +#> +#> The models (sorted by BPP): +#> model_df df_diff Prior BIC BPP cfi rmsea srmr +#> original 2 0 0.500 464.320 0.741 0.958 0.083 0.050 +#> drop: y~x 3 -1 0.083 464.340 0.122 0.848 0.129 0.095 +#> add: y~m1 1 1 0.083 465.900 0.056 1.000 0.000 0.018 +#> drop: m2~m1 3 -1 0.083 465.953 0.055 0.800 0.148 0.116 +#> add: m2~x 1 1 0.083 468.575 0.015 0.939 0.142 0.046 +#> drop: m1~x 3 -1 0.083 469.403 0.010 0.695 0.183 0.124 +#> drop: y~m2 3 -1 0.083 473.291 0.001 0.577 0.216 0.133 +#> +#> Note: +#> - BIC: Bayesian Information Criterion. +#> - BPP: BIC posterior probability. +#> - model_df: Model degrees of freedom. +#> - df_diff: Difference in df compared to the original/target model. +#> - To show cumulative BPPs, call print() with 'cumulative_bpp = TRUE'. +#> - At least one model has fixed.x = TRUE. The models are not checked for +#> equivalence. +#> - Since Version 0.1.3.5, the default values of exclude_feedback and +#> exclude_xy_cov changed to TRUE. Set them to FALSE to reproduce +#> results from previous versions. +``` + +If the prior of the target is set to .50, +then, taking into account both the prior +probabilities and the data, the target +model is strongly supported by the data. + +This is the output of `model_graph()`: + + +``` r +graph1_prior <- model_graph(out1_prior) +plot(graph1_prior) +``` + +
+plot of chunk out1_prior +

plot of chunk out1_prior

+
+ +# Advanced Options + +## More Neighboring Models + +If desired, we can enumerate models +"farther away" from the target model. +For example, we can set the +maximum difference +in model *df* to 2, to include models +having two more or two less *df* than +the original model: + + +``` r +out1_df2 <- model_set(fit1, + df_change_add = 2, + df_change_drop = 2) +``` + +This is the printout. By default, when there +are more than 20 models, only the top 20 +models on BPP will be printed: + + +``` r +out1_df2 +#> +#> Call: +#> model_set(sem_out = fit1, df_change_add = 2, df_change_drop = 2) +#> +#> Number of model(s) fitted : 14 +#> Number of model(s) converged : 14 +#> Number of model(s) passed post.check: 14 +#> +#> The models (sorted by BPP): +#> model_df df_diff Prior BIC BPP cfi rmsea srmr +#> original 2 0 0.071 464.320 0.269 0.958 0.083 0.050 +#> drop: y~x 3 -1 0.071 464.340 0.267 0.848 0.129 0.095 +#> add: y~m1 1 1 0.071 465.900 0.122 1.000 0.000 0.018 +#> drop: m2~m1 3 -1 0.071 465.953 0.119 0.800 0.148 0.116 +#> drop: m2~m1;y~x 4 -2 0.071 465.973 0.118 0.690 0.160 0.149 +#> add: m2~x 1 1 0.071 468.575 0.032 0.939 0.142 0.046 +#> drop: m1~x 3 -1 0.071 469.403 0.021 0.695 0.183 0.124 +#> drop: m1~x;y~x 4 -2 0.071 469.423 0.021 0.585 0.185 0.144 +#> add: m2~x;y~m1 0 2 0.071 470.154 0.015 1.000 0.000 0.000 +#> drop: m1~x;m2~m1 4 -2 0.071 471.036 0.009 0.536 0.196 0.160 +#> drop: y~m2 3 -1 0.071 473.291 0.003 0.577 0.216 0.133 +#> drop: y~m2;y~x 4 -2 0.071 474.819 0.001 0.422 0.218 0.170 +#> drop: m2~m1;y~m2 4 -2 0.071 474.924 0.001 0.419 0.219 0.163 +#> drop: m1~x;y~m2 4 -2 0.071 478.374 0.000 0.314 0.238 0.183 +#> +#> Note: +#> - BIC: Bayesian Information Criterion. +#> - BPP: BIC posterior probability. +#> - model_df: Model degrees of freedom. +#> - df_diff: Difference in df compared to the original/target model. +#> - To show cumulative BPPs, call print() with 'cumulative_bpp = TRUE'. +#> - At least one model has fixed.x = TRUE. The models are not checked for +#> equivalence. +#> - Since Version 0.1.3.5, the default values of exclude_feedback and +#> exclude_xy_cov changed to TRUE. Set them to FALSE to reproduce +#> results from previous versions. +``` + +The number of models examined, including +the original model, is 14. + +This is the output of `model_graph()`: + + +``` r +graph1_df2 <- model_graph(out1_df2, + node_label_size = .75) +plot(graph1_df2) +``` + +
+plot of chunk graph1_df2 +

plot of chunk graph1_df2

+
+ +Note: Due to the number of nodes, +`node_label_size` is used to reduce +the size of the labels. + +## Excluding Some Parameters From the Search + +When calling `model_set()`, users can +specify parameters that must be excluded +from the list to be added (`must_not_add`), +or must not be dropped (`must_not_drop`). + +For example, suppose it is well +established that `m1~x` exists and should +not be dropped, we can exclude it +when calling `model_set()` + + +``` r +out1_no_m1_x <- model_set(fit1, + must_not_drop = "m1~x") +``` + +This is the output: + + +``` r +out1_no_m1_x +#> +#> Call: +#> model_set(sem_out = fit1, must_not_drop = "m1~x") +#> +#> Number of model(s) fitted : 6 +#> Number of model(s) converged : 6 +#> Number of model(s) passed post.check: 6 +#> +#> The models (sorted by BPP): +#> model_df df_diff Prior BIC BPP cfi rmsea srmr +#> original 2 0 0.167 464.320 0.331 0.958 0.083 0.050 +#> drop: y~x 3 -1 0.167 464.340 0.328 0.848 0.129 0.095 +#> add: y~m1 1 1 0.167 465.900 0.150 1.000 0.000 0.018 +#> drop: m2~m1 3 -1 0.167 465.953 0.147 0.800 0.148 0.116 +#> add: m2~x 1 1 0.167 468.575 0.040 0.939 0.142 0.046 +#> drop: y~m2 3 -1 0.167 473.291 0.004 0.577 0.216 0.133 +#> +#> Note: +#> - BIC: Bayesian Information Criterion. +#> - BPP: BIC posterior probability. +#> - model_df: Model degrees of freedom. +#> - df_diff: Difference in df compared to the original/target model. +#> - To show cumulative BPPs, call print() with 'cumulative_bpp = TRUE'. +#> - At least one model has fixed.x = TRUE. The models are not checked for +#> equivalence. +#> - Since Version 0.1.3.5, the default values of exclude_feedback and +#> exclude_xy_cov changed to TRUE. Set them to FALSE to reproduce +#> results from previous versions. +``` + +The number of models reduced to 14. + +This is the plot: + + +``` r +out1_no_m1_x <- model_graph(out1_no_m1_x) +plot(out1_no_m1_x) +``` + +
+plot of chunk out1_no_m1_x +

plot of chunk out1_no_m1_x

+
+ +## Models With Constraints + +If the original model has equality +constraints, they will be included in +the search for neighboring models, +by default. That is, removing one +equality constraint between two models +is considered as a model with an +increase of 1 *df*. + +## Recompute BPPs Without Refitting the Models + +Users can examine the impact of the prior +probability of the original model without +refitting the models, by using +the output of `model_set()` as the +input, using the `model_set_out` argument: + + +``` r +out1_new_prior <- model_set(model_set_out = out1, + prior_sem_out = .50) +``` + +The results are identical to calling +`model_set()` with the original `lavaan` +output as the input: + + +``` r +out1_new_prior +#> +#> Call: +#> model_set(model_set_out = out1, prior_sem_out = 0.5) +#> +#> Number of model(s) fitted : 7 +#> Number of model(s) converged : 7 +#> Number of model(s) passed post.check: 7 +#> +#> The models (sorted by BPP): +#> model_df df_diff Prior BIC BPP cfi rmsea srmr +#> original 2 0 0.500 464.320 0.741 0.958 0.083 0.050 +#> drop: y~x 3 -1 0.083 464.340 0.122 0.848 0.129 0.095 +#> add: y~m1 1 1 0.083 465.900 0.056 1.000 0.000 0.018 +#> drop: m2~m1 3 -1 0.083 465.953 0.055 0.800 0.148 0.116 +#> add: m2~x 1 1 0.083 468.575 0.015 0.939 0.142 0.046 +#> drop: m1~x 3 -1 0.083 469.403 0.010 0.695 0.183 0.124 +#> drop: y~m2 3 -1 0.083 473.291 0.001 0.577 0.216 0.133 +#> +#> Note: +#> - BIC: Bayesian Information Criterion. +#> - BPP: BIC posterior probability. +#> - model_df: Model degrees of freedom. +#> - df_diff: Difference in df compared to the original/target model. +#> - To show cumulative BPPs, call print() with 'cumulative_bpp = TRUE'. +#> - At least one model has fixed.x = TRUE. The models are not checked for +#> equivalence. +#> - Since Version 0.1.3.5, the default values of exclude_feedback and +#> exclude_xy_cov changed to TRUE. Set them to FALSE to reproduce +#> results from previous versions. +``` + +## Many Neighboring Models + +When a model has a lot of free parameters, +the number of neighboring models can be +large and it will take a long time to +fit all of them. Users can enable +parallel processing by setting `parallel` +to `TRUE` when calling `model_set()`. + +## More Options + +Please refer to the help page of +`model_set()` for options available. + +# Further Information + +For further information on other +functions, +please refer to their help pages. + +# References diff --git a/vignettes/modelbpp.Rmd.original b/vignettes/modelbpp.Rmd.original new file mode 100644 index 0000000..081e575 --- /dev/null +++ b/vignettes/modelbpp.Rmd.original @@ -0,0 +1,369 @@ +--- +title: "Get Started" +author: "Shu Fai Cheung" +date: "`r Sys.Date()`" +output: + rmarkdown::html_vignette: + number_sections: true +vignette: > + %\VignetteIndexEntry{Get Started} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +bibliography: references.bib +csl: apa.csl +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>", + # fig.width = 6, + # fig.height = 6, + dpi = 192, + fig.align = "center", + fig.path = "", + dev.args = list(bg = "transparent") +) +``` + +# Introduction + +This article illustrates how to +use `model_set()` and other functions +from the package +[`modelbpp`](https://sfcheung.github.io/modelbpp/) +to: + +- Fit a set of neighboring models, + each has one more or one less degree + of freedom than the original fitted + model. + +- Compute the BIC posterior probability + (BPP), for each model [@wu_simple_2020]. + +- Use BPP to assess to what extent + each model is supported by the + data, compared to all other models + under consideration. + +# Workflow + +1. Fit an SEM model, the + original model, as usual in `lavaan`. + +2. Call `model_set()` on the output from + Step 1. It will automatically do the + following: + + - Enumerate the neighboring models + of the original model. + + - Fit all the models and compute their + BIC posterior probabilities (BPPs). + +3. Examine the results by: + + - printing the output of `model_set()`, or + + - generating a graph using `model_graph()`. + +# Example + +This is a sample dataset, +`dat_serial_4_weak`, +with four variables: + +```{r} +library(modelbpp) +head(dat_serial_4_weak) +``` + +## Step 1: Fit the Original Model + +Fit this original model, a serial +mediation model, with one direct path, +from `x` to `y`: + +```{r} +library(lavaan) +mod1 <- +" +m1 ~ x +m2 ~ m1 +y ~ m2 + x +" +fit1 <- sem(mod1, dat_serial_4_weak) +``` + +This the summary: + +```{r} +summary(fit1, + fit.measures = TRUE) +``` + +```{r echo = FALSE} +tmp <- fitMeasures(fit1) +fit1_cfi <- unname(tmp["cfi"]) +fit1_rmsea <- unname(tmp["rmsea"]) +``` + +The fit is acceptable, though the RMSEA +is marginal (CFI = `r formatC(fit1_cfi, 3, format = "f")`, +RMSEA = `r formatC(fit1_rmsea, 3, format = "f")`). + +## Step 2: Call `model_set()` + +Use `model_set()` to find the +neighboring models differ +from the target model by one on model +degrees of freedom, fit them, and compute +the BPPs: + +```{r results = FALSE} +out1 <- model_set(fit1) +``` + +## Step 3: Examine the Results + +To examine the results, just print +the output: + +```{r} +out1 +``` + +```{r echo = FALSE} +out1_bpp <- out1$bpp +out1_bpp_2 <- sort(out1_bpp, decreasing = TRUE)[2] +``` + +The total number of models examined, +including the original model, is `r length(out1$models)`. + +(Note: The total number of models +was 9 in previous version. Please +refer to the Note in the printout for +the changes.) + +The BIC posterior probabilities +(BPPs) suggest that +the original model is indeed the most +probable model based on BPP. However, +the model with the direct path +dropped, `drop: y~x`, only +has slightly lower BPP +(`r formatC(out1_bpp_2, 3, format = "f")`) + +This suggests that, with equal prior +probabilities [@wu_simple_2020], the +support for the model with the direct +and without the direct path have similar +support from the data based on BPP. + +Alternatively, we can use `model_graph()` +to visualize the BPPs and model relations +graphically: + +```{r graph1, fig.height = 8, fig.width = 8} +graph1 <- model_graph(out1) +plot(graph1) +``` + +Each node (circle) represents one model. +The larger the BPP, the larger the node. + +The arrow points from a simpler +model (a model with larger model *df*) +to a more complicated model (a model +with smaller model *df*). If two models +are connected by an arrow, then +one model can be formed from another model +by adding or removing one free parameter +(e.g., adding or removing one path). + +## Repeat Step 2 with User Prior + +In real studies, not all models are +equally probable before having data +(i.e., not all models have equal +prior probabilities). A researcher +fits the original model because + +- its + prior probability is higher than other + models, at least other neighboring + models (otherwise, it is not worthy + of collecting data assess thi original + model), but + +- the prior probability + is not so high to eliminate the need + for collecting data to see how much it is + supported by data. + +Suppose we decide that the prior probability +of the original model is .50: probable, but +still needs data to decide whether it is +empirically supported + +This can be done by setting `prior_sem_out` +to the desired prior probability when calling +`model_set()`: + +```{r results = FALSE} +out1_prior <- model_set(fit1, + prior_sem_out = .50) +``` + +The prior probabilities of all other +models are equal. Therefore, with +nine models and the prior of the target +model being .50, the prior probability +of the other eight model is (1 - .50) / 8 +or .0625. + +This is the printout: + +```{r} +out1_prior +``` + +If the prior of the target is set to .50, +then, taking into account both the prior +probabilities and the data, the target +model is strongly supported by the data. + +This is the output of `model_graph()`: + +```{r out1_prior, fig.height = 8, fig.width = 8} +graph1_prior <- model_graph(out1_prior) +plot(graph1_prior) +``` + +# Advanced Options + +## More Neighboring Models + +If desired, we can enumerate models +"farther away" from the target model. +For example, we can set the +maximum difference +in model *df* to 2, to include models +having two more or two less *df* than +the original model: + +```{r results = FALSE} +out1_df2 <- model_set(fit1, + df_change_add = 2, + df_change_drop = 2) +``` + +This is the printout. By default, when there +are more than 20 models, only the top 20 +models on BPP will be printed: + +```{r} +out1_df2 +``` + +The number of models examined, including +the original model, is `r length(out1_df2$models)`. + +This is the output of `model_graph()`: + +```{r graph1_df2, fig.height = 8, fig.width = 8} +graph1_df2 <- model_graph(out1_df2, + node_label_size = .75) +plot(graph1_df2) +``` + +Note: Due to the number of nodes, +`node_label_size` is used to reduce +the size of the labels. + +## Excluding Some Parameters From the Search + +When calling `model_set()`, users can +specify parameters that must be excluded +from the list to be added (`must_not_add`), +or must not be dropped (`must_not_drop`). + +For example, suppose it is well +established that `m1~x` exists and should +not be dropped, we can exclude it +when calling `model_set()` + +```{r results = FALSE} +out1_no_m1_x <- model_set(fit1, + must_not_drop = "m1~x") +``` + +This is the output: + +```{r} +out1_no_m1_x +``` + +The number of models reduced to `r length(out1_df2$models)`. + +This is the plot: + +```{r out1_no_m1_x, fig.height = 6, fig.width = 8} +out1_no_m1_x <- model_graph(out1_no_m1_x) +plot(out1_no_m1_x) +``` + +## Models With Constraints + +If the original model has equality +constraints, they will be included in +the search for neighboring models, +by default. That is, removing one +equality constraint between two models +is considered as a model with an +increase of 1 *df*. + +## Recompute BPPs Without Refitting the Models + +Users can examine the impact of the prior +probability of the original model without +refitting the models, by using +the output of `model_set()` as the +input, using the `model_set_out` argument: + +```{r results = FALSE} +out1_new_prior <- model_set(model_set_out = out1, + prior_sem_out = .50) +``` + +The results are identical to calling +`model_set()` with the original `lavaan` +output as the input: + +```{r} +out1_new_prior +``` + +## Many Neighboring Models + +When a model has a lot of free parameters, +the number of neighboring models can be +large and it will take a long time to +fit all of them. Users can enable +parallel processing by setting `parallel` +to `TRUE` when calling `model_set()`. + +## More Options + +Please refer to the help page of +`model_set()` for options available. + +# Further Information + +For further information on other +functions, +please refer to their help pages. + +# References diff --git a/vignettes/out1_no_m1_x-1.png b/vignettes/out1_no_m1_x-1.png index 6457c86..d374b37 100644 Binary files a/vignettes/out1_no_m1_x-1.png and b/vignettes/out1_no_m1_x-1.png differ diff --git a/vignettes/out1_prior-1.png b/vignettes/out1_prior-1.png index 7cd6127..f509998 100644 Binary files a/vignettes/out1_prior-1.png and b/vignettes/out1_prior-1.png differ