mice

Want to become a MICE PRO?

Check out our summer course 2026 at the Utrecht Summer School: Data Science: Solving Missing Data Problems in R

Multivariate Imputation by Chained Equations

The mice package implements a method to deal with missing data. The package creates multiple imputations (replacement values) for multivariate missing data. The method is based on Fully Conditional Specification, where each incomplete variable is imputed by a separate model. The MICE algorithm can impute mixes of continuous, binary, unordered categorical and ordered categorical data. In addition, MICE can impute continuous two-level data, and maintain consistency between imputations by means of passive imputation. Many diagnostic plots are implemented to inspect the quality of the imputations.

Installation

The mice package can be installed from CRAN as follows:

install.packages("mice")

The latest version can be installed from GitHub as follows:

install.packages("devtools")
devtools::install_github(repo = "amices/mice")

Minimal example

library(mice, warn.conflicts = FALSE)

# show the missing data pattern
md.pattern(nhanes)

Missing data pattern of nhanes data. Blue is observed, red is missing.

#>    age hyp bmi chl   
#> 13   1   1   1   1  0
#> 3    1   1   1   0  1
#> 1    1   1   0   1  1
#> 1    1   0   0   1  2
#> 7    1   0   0   0  3
#>      0   8   9  10 27

The table and the graph summarize where the missing data occur in the nhanes dataset.

# multiple impute the missing values
imp <- mice(nhanes, maxit = 2, m = 2, seed = 1)
#> 
#>  iter imp variable
#>   1   1  bmi  hyp  chl
#>   1   2  bmi  hyp  chl
#>   2   1  bmi  hyp  chl
#>   2   2  bmi  hyp  chl

# inspect quality of imputations
stripplot(imp, chl, pch = 19, xlab = "Imputation number")

Distribution of chl per imputed data set.

In general, we would like the imputations to be plausible, i.e., values that could have been observed if they had not been missing.

# fit complete-data model
fit <- with(imp, lm(chl ~ age + bmi))

# pool and summarize the results
summary(pool(fit))
#>          term estimate std.error statistic    df p.value
#> 1 (Intercept)     9.08     73.09     0.124  4.50  0.9065
#> 2         age    35.23     17.46     2.017  1.36  0.2377
#> 3         bmi     4.69      1.94     2.417 15.25  0.0286

The complete-data is fit to each imputed dataset, and the results are combined to arrive at estimates that properly account for the missing data.

`mice 3.0`

Version 3.0 represents a major update that implements the following features:

blocks: The main algorithm iterates over blocks. A block is simply a collection of variables. In the common MICE algorithm each block was equivalent to one variable, which - of course - is the default; The blocks argument allows mixing univariate imputation method multivariate imputation methods. The blocks feature bridges two seemingly disparate approaches, joint modeling and fully conditional specification, into one framework;
where: The where argument is a logical matrix of the same size of data that specifies which cells should be imputed. This opens up some new analytic possibilities;
Multivariate tests: There are new functions D1(), D2(), D3() and anova() that perform multivariate parameter tests on the repeated analysis from on multiply-imputed data;
formulas: The old form argument has been redesign and is now renamed to formulas. This provides an alternative way to specify imputation models that exploits the full power of R’s native formula’s.
Better integration with the tidyverse framework, especially for packages dplyr, tibble and broom;
Improved numerical algorithms for low-level imputation function. Better handling of duplicate variables.
Last but not least: A brand new edition AND online version of Flexible Imputation of Missing Data. Second Edition.

See MICE: Multivariate Imputation by Chained Equations for more resources.

I’ll be happy to take feedback and discuss suggestions. Please submit these through Github’s issues facility.

Resources

Theoretical background

mice: Multivariate Imputation by Chained Equations in R in the Journal of Statistical Software (S. van Buuren and Groothuis-Oudshoorn 2011).
The first application on missing blood pressure data (S. van Buuren, Boshuizen, and Knook 1999).
Term Fully Conditional Specification describes a general class of methods that specify imputations model for multivariate data as a set of conditional distributions (S. van Buuren et al. 2006).
Details about imputing mixes of numerical and categorical data can be found in (S. van Buuren 2007).
Book Flexible Imputation of Missing Data. Second Edition (Stef van Buuren 2018).

Course materials

Vignettes

Code from publications

Flexible Imputation of Missing Data. Second edition.

Acknowledgement

The cute mice sticker was designed by Jaden M. Walters. Thanks Jaden!

Code of Conduct

Please note that the mice project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

van Buuren, S. 2007. “Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification.” Statistical Methods in Medical Research 16 (3): 219–42. https://doi.org/10.1177/0962280206074463.

van Buuren, S., H. C. Boshuizen, and D. L. Knook. 1999. “Multiple Imputation of Missing Blood Pressure Covariates in Survival Analysis.” Statistics in Medicine 18 (6): 681–94. https://doi.org/10.1002/(sici)1097-0258(19990330)18:6<681::aid-sim71>3.0.co;2-r.

van Buuren, S., J. P. L. Brand, C. G. M. Groothuis-Oudshoorn, and D. B. Rubin. 2006. “Fully Conditional Specification in Multivariate Imputation.” Journal of Statistical Computation and Simulation 76 (12): 1049–64. https://doi.org/10.1080/10629360600810434.

van Buuren, S., and K. Groothuis-Oudshoorn. 2011. “mice: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software 45 (3): 1–67. https://doi.org/10.18637/jss.v045.i03.

van Buuren, Stef. 2018. Flexible Imputation of Missing Data. 2nd ed. Interdisciplinary Statistics Series. Chapman and Hall/CRC. https://doi.org/10.1201/9780429492259.

Name		Name	Last commit message	Last commit date
Latest commit History 1,779 Commits
.github		.github
R		R
data-raw		data-raw
data		data
docs		docs
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
revdep		revdep
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.html		CODE_OF_CONDUCT.html
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_config.yml		_config.yml
_pkgdown.yml		_pkgdown.yml
cran-comments.Rmd		cran-comments.Rmd
cran-comments.md		cran-comments.md
refs.bibtex		refs.bibtex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mice

Want to become a MICE PRO?

Multivariate Imputation by Chained Equations

Installation

Minimal example

`mice 3.0`

Resources

Theoretical background

Course materials

Vignettes

Code from publications

Acknowledgement

Code of Conduct

References

About

Uh oh!

Releases 22

Uh oh!

Contributors 36

Uh oh!

Languages

License

amices/mice

Folders and files

Latest commit

History

Repository files navigation

mice

Want to become a MICE PRO?

Multivariate Imputation by Chained Equations

Installation

Minimal example

mice 3.0

Resources

Theoretical background

Course materials

Vignettes

Code from publications

Acknowledgement

Code of Conduct

References

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 22

Uh oh!

Contributors 36

Uh oh!

Languages

`mice 3.0`