|Maintainer:||Julie Josse, Imke Mayer, Nicholas Tierney, Nathalie Vialaneix|
|Contact:||r-miss-tastic at clementine.wf|
|Contributions:||Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide.|
|Citation:||Julie Josse, Imke Mayer, Nicholas Tierney, Nathalie Vialaneix (2023). CRAN Task View: Missing Data. Version 2023-11-13. URL https://CRAN.R-project.org/view=MissingData.|
|Installation:||The packages from this task view can be installed automatically using the ctv package. For example, |
Missing data are very frequently found in datasets. Base R provides a few options to handle them using computations that involve only observed data (
na.rm = TRUE in functions
var, … or
use = complete.obs|na.or.complete|pairwise.complete.obs in functions
cor, …). The base package
stats also contains the generic function
na.action that extracts information of the
NA action used to create an object. In addition, the package ie2misc contains a dyadic operator
+ that behaves differently than the original
+ operator regarding missing data.
These basic options are complemented by many packages on CRAN. In this task view, we focused on the most important ones, which have been published more than one year ago and are regularly updated. The task view is structured into main topics:
In addition to the present task view, this reference website on missing data might also be helpful. Complementary information might also be found in TimeSeries, SpatioTemporal, Survival, and OfficialStatistics. Note that most packages covering temporal, and spatio-temporal interpolation and censored data are not covered by the Missing Data task view.
If you think we have missed some important packages in this list, please e-mail the maintainers or submit an issue or pull request in the GitHub repository linked above.
Exploration of missing data
NAand fillr fill missing values in vectors according to simple predefined rules.
amputeof mice, the package simFrame, which proposes a very general framework for simulations, or the package simglm, which simulates data and missing values in simple and generalized linear regression models. Similarly, imputeTestbench provides a benchmark to evaluate univariate time series imputation.
Likelihood based approaches
em.catfor multivariate categorical data), in mix (function
em.mixfor multivariate mixed categorical and continuous data). These packages also implement Bayesian approaches (with Imputation and Posterior steps) for the same models (functions
mix) and can be used to obtain imputed complete datasets or multiple imputations (functions
mix), once the model parameters have been estimated. monomvn proposes similar methods for multivariate normal and Student distributions when the missingness pattern is monotonic.
MixtComp. It can be used in combination with RMixtCompUtilities, which provides various graphical, getters, and utility functions.
hotdeck) and a fractional version (using weights) is provided in FHDI. StatMatch also uses hot-deck imputation to impute surveys from an external dataset.
regressionImp). iai tunes optimal imputation based on knn, tree or SVM and SurrogateRegression uses bivariate regressions to perform estimation and inference on partially missing target outcomes.
Some of the above mentioned packages can also handle multiple imputations.
Specific types of data
Specific application fields
|Core:||Amelia, hot.deck, imputeTS, jomo, mice, missMDA, naniar, softImpute, VIM, yaImpute.|
|Regular:||accelmissing, ade4, AeRobiology, aLFQ, alleHap, areal, bayesCT, BayesMallows, bcROCsurface, biclustermd, BIFIEsurvey, bild, BLOQ, bmem, bmemLavaan, BMTAR, bnstruct, bootImpute, brlrmr, brokenstick, brxx, bucky, CALIBERrfimpute, cat, CensMFM, cglasso, CGManalyzer, climatol, ClusPred, ClustImpute, CMF, cmfrec, cobalt, coefficientalpha, CoImp, COINr, cold, convergEU, creditmodel, CRTgeeDR, daqapo, declared, deductive, dejaVu, denoiseR, DescTools, didimputation, diyar, dlookr, dosearch, DrImpute, DTSg, DTWBI, DTWUMI, eatRep, ECLRMC, edmcr, eicm, eigenmodel, eimpute, eRm, exdex, experiment, FamEvent, FastImputation, fastLink, fauxnaif, FHDI, FILEST, filling, fillr, flare, forecast, FSMUMI, gapfill, gbmt, geneticae, gerbil, ggmice, grf, GSE, gsynth, hapassoc, Haplin, HardyWeinberg, hhsmm, Hmisc, iai, iCellR, icenReg, ie2misc, imp4p, impimp, imputeFin, imputeLCMD, imputeMulti, imputeR, imputeTestbench, IncomPair, InformativeCensoring, ipw, IPWboxplot, irrNA, Iscores, isni, isotree, iWeigReg, JointAI, lavaan, lfl, lilikoi, LNIRT, lodi, lori, LOST, lqr, ltm, LUCIDus, MatchThem, mde, mdgc, mdmb, memisc, metagear, metansue, metasens, metavcov, MGMM, mi, mi4p, miceadds, miceafter, miceFast, micemd, miceRanger, miclust, migui, MIIPW, mimi, mirt, misaem, missCompare, missForest, missingHE, missMethods, missRanger, missSBM, missSOM, misty, mitml, mitools, miWQS, mix, mixture, MixtureMissing, MKinfer, MLCIRTwithin, mlmi, MMDai, modi, momentuHMM, monomvn, mreg, mvnmle, NADIA, naivebayes, nawtilus, niaidMI, NIMAA, nipals, NIRStat, NMADiagT, norm, NPBayesImputeCat, OpenMx, OTrecod, padr, pan, paths, pCODE, phylin, PKLMtest, plsRbeta, plsRglm, ppmSuite, prefmod, PReMiuM, primePCA, prophet, pseval, psfmi, qgtools, qpNCA, QTLRel, Qtools, QUALYPSO, R6causal, randomForest, RBtest, RCAL, RcppCensSpatial, retroharmonize, RfEmpImp, Rforestry, rMIDAS, RMixtComp, RMixtCompIO, RMixtCompUtilities, RNAseqNet, rnmamod, robber, robCompositions, robustrank, robustrao, roperators, ROptSpace, Rphylopars, rrcovNA, rsem, rsparse, rtop, sanon, SAVER, scorecardModelUtils, semTools, sievePH, simFrame, simglm, simputation, simsem, sjlabelled, sjmisc, smcfcs, SNPassoc, SNPfiltR, SOMbrero, spacetime, StAMPP, StatMatch, StempCens, stfit, stlplus, StratifiedRF, SurrogateRegression, swgee, SynthTools, TAM, targeted, tensorBF, TestDataImputation, tidyr, timeSeries, TreeSim, TRMF, tsibble, tsrobprep, ui, VarSelLCM, wrangle, wrProteo, xts, zCompositions, zoo.|